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Preface 


The intention of this textbook is to provide a concise explanation of fundamentals 
and background of the surround sound recording and playback technology 
Ambisonics. 

Despite the Ambisonic technology has been practiced in the academic world for 
quite some time, it is happening now that the recent ITU,| MPEG-H,* and ETSI? 
standards firmly fix it into the production and media broadcasting world. 

What is more, Internet giants Google/YouTube recently recommended to use tools 
that have been well adopted from what the academic world is currently using.*° 

Last but most importantly, the boost given to the Ambisonic technology by 
recent advancements has been in usability: Ways to obtain safe Ambisonic deco- 
ders,” the availability of higher-order Ambisonic main microphone arrays 
(Eigenmike,* Zylia’) and their filter-design theory, and above all: the usability 
increased by plugins integrating higher-order Ambisonic production in digital audio 
workstations or mixers.” 1>'1!>13-1415 And this progress was a great motivation to 
write a book about the basics. 


"https://www.itu.int/rec/R-REC-BS.2076/en. 

? https://www.iso.org/standard/69561.html. 
3https://www.techstreet.com/standards/etsi-ts-103-49 1 ?product_id=1987449. 
“https://support.google.com/jump/answer/6399746?hl=en. 

> https://developers.google.com/vr/concepts/spatial-audio. 
°https://bitbucket.org/ambidecodertoolbox/adt.git. 
"https://plugins.iem.at/. 

Shttps://mhacoustics.com/products. 

* https://www.zylia.co. 

10 http://www.matthiaskronlachner.com/?p=2015. 

H http://www. blueripplesound.com/product-listings/pro-audio. 
'https://b-com.com/en/bcom-spatial-audio-toolbox-render-plugins. 
3 https://harpex.net/. 

4 http://forumnet.ircam fr/product/panoramix-en/. 

'S http://research.spa.aalto. fi/projects/sparta_vsts/. 
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The book is dedicated to provide a deeper understanding of Ambisonic tech- 
nologies, especially for but not limited to readers who are scientists, audio-system 
engineers, and audio recording engineers. As, from time to time, the underlying 
maths would get too long for practical readability, the book comes with a com- 
prehensive appendix with the beautiful mathematical details. 

For a common understanding, the introductory section spans a perspective on 
Ambisonics from its origins in coincident recordings from the 1930s, to the 
Ambisonic concepts from the 1970s, and to classical ways of applying Ambisonics 
in first-order coincident sound scene recording and reproduction that have been 
practiced from the 1980s on. 

In its main contents, this book intends to provide all psychoacoustical, signal 
processing, acoustical, and mathematical knowledge needed to understand the inner 
workings of modern processing utilities, special equipment for recording, manip- 
ulation, and reproduction in the higher-order Ambisonic format. As advanced 
outcomes, the aim of the book is to explain higher-order Ambisonic decoding, 3D 
audio effects, and higher-order Ambisonic recording with microphones or main 
microphone arrays. Those techniques are shown to be suitable to supply audience 
areas ranging from studio-sized to hundreds of listeners, or headphone-based 
playback, regardless whether it is live, interactive, or studio-produced 3D audio 
material. 

The book comes with various practical examples based on free software tools 
and open scientific data for reproducible research. 

Our Ambisonic events experience: In the past years, we have contributed to 
organizing Symposia on Ambisonics (Ambisonics Symposium 2009 in Graz, 2010 
in Paris, 2011 in Lexington, 2012 in York, 2014 in Berlin), demonstrated and 
brought the technology to various winter/summer schools and conferences (EAA 
Winter School Merano 2013, EAA Symposium Berlin 2014, workshops and 
Ambisonic music repertory demonstration at Darmstädter Ferienkurse fiir Neue 
Musik in 2014, ICAD workshop in Graz 2015, ICSA workshop 2015 in Graz with 
PURE Ambisonics night, summer school at ICSA 2017 in Graz, a course at Kraków 
film music festival 2015, mAmbA demo facility DAGA in Aachen 2016, Al Di 
Meola’s live 3D audio concert hosted in Graz in June 2016, and AES Convention 
Milano 2018. 

In 2017 (CSA Graz) and 2018 (TMT Cologne), we initiated and organized 
Europe’s First and Second Student 3D Audio Production Competition together with 
Markus Zaunschirm and Daniel Rudrich. 


Graz, Austria Franz Zotter 
February 2019 Matthias Frank 
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Outline 


First-order Ambisonics is nowadays strongly revived by internet technology sup- 
ported by Google/YouTube, Facebook 360°, 360° audio and video recording and 
rendering, as well as VR in games. This renaissance lies in its benefits of (i) its 
compact main microphone arrays capturing the entire surrounding sound scene in 
only four audio channels (e.g., Zoom H3-VR, Oktava A-Format Microphone, Rede 
NT-SF1, Sennheiser AMBEO VR Mic.), and (ii) it easily permits rotation of the 
sound scene, allowing to render surround audio scenes, e.g., on head-tracked 
headphones, head-mounted AR/VR sets, or mobile devices, as described in Chap. 1. 

Auditory events and vector-base panning: Chapter 2 of this book is dedicated to 
conveying a comprehensive understanding of the localization impressions in 
multi-loudspeaker playback and its models, followed by Chap. 3 that outlines the 
essentials of practical vector panning models and their extensions by downmix from 
imaginary loudspeakers, which are both fundamental to contemporary Ambisonics. 

Harmonic functions, Ambisonic encoding and decoding: Based on the ideals of 
accurate localization with panning-invariant loudness and perceived width, Chap. 4 
provides a profound mathematical derivation of higher-order Ambisonic panning 
functions in 2D and 3D in terms of angular harmonics. These idealized functions 
can be maximized in their directional focus (max-rg) and they are strictly limited in 
their directional resolution. This resolution limit entails perfectly well-defined 
constraints on loudspeaker layouts that make us reach ideal measures for accurate 
localization as well as panning-invariant loudness and width. And what is highly 
relevant for practical decoding: All-Round Ambisonic decoding to loudspeakers 
and TAC/MagLS decoders for headphones are explained in Chap. 4. 

The Ambisonic signal processing chain and effects are described in Chap. 5. It 
illustrates the signal flow from source encoding through Ambisonic bus to decoding 
and where input-specific or general insert and auxiliary Ambisonic effects are 
located. In particular, the chapter describes the working principles behind 
frequency-independent manipulation effects that are either mirroring/rotating/ 
re-mapping, warping, or directionally weighting, or such effects that are 
frequency-dependent. Frequency-dependent effects can introduce widening, depth 
or diffuseness, convolution reverb, or feedback-delay-network (FDN)-based diffuse 
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reverberation. Directional resolution enhancements are outlined in terms of 
SDM/SIRR pre-processing of recorded reverberation and in terms of available tools 
such as HARPEX, DirAC, and COMPASS for recorded signals. 

Compact higher-order Ambisonic microphones rely on the solutions of the 
Helmholtz equation, and their processing uses a frequency-independent decom- 
position of the spherical array signals into spherical harmonics and the 
frequency-dependent radial-focusing filtering associated with each spherical har- 
monic order, which yield the Ambisonic signals. The critical part is to handle the 
properties of radial-focusing filters in the processing of higher-order Ambisonic 
microphone arrays (e.g., the Eigenmike). To keep the noise level and the sidelobes 
in the recordings low and a balanced frequency response, a careful way for radial 
filter design is outlined in Chap. 6. 

Compact higher-order loudspeaker arrays oppose the otherwise inwards- 
oriented Ambisonic surround playback, as described in Chap. 7. This outlooking 
last chapter discusses IKO and loudspeaker cubes as compact spherical loudspeaker 
arrays with Ambisonically controlled radiation patterns. In natural environments 
with acoustic reflections, such directivity-controlled arrays have their own 
sound-projecting and distance-changing effects, and they can be used to simulate 
sources of specific directivity patterns. 
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Chapter 1 A) 
XY, MS, and First-Order Ambisonics geit 


Directionally sensitive microphones may be of the light moving 
strip type. [...] the strips may face directions at 45° on each side 
of the centre line to the sound source. 


Alan Dower Blumlein [1], Patent, 1931 


Abstract This chapter describes first-order Ambisonic technologies starting from 
classical coincident audio recording and playback principles from the 1930s until 
the invention of first-order Ambisonics in the 1970s. Coincident recording is based 
on arrangements of directional microphones at the smallest-possible spacings in 
between. Hereby incident sound approximately arrives with equal delay at all micro- 
phones. Intensity-based coincident stereophonic recording such as XY and MS 
typically yields stable directional playback on a stereophonic loudspeaker pair. While 
the stereo width is adjustable by MS processing, the directional mapping of first- 
order Ambisonics is a bit more rigid: the omnidirectional and figure-of-eight record- 
ing pickup patterns are reproduced unaltered by equivalent patterns in playback. In 
perfect appreciation of the benefits of coincident first-order Ambisonic recording 
technologies in VR and field recording, the chapter gives practical examples for 
encoding, headphone- and loudspeaker-based decoding. It concludes with a desire 
for a higher-order Ambisonics format to get a larger sweet area and accommo- 
date first-order resolution-enhancement algorithms, the embedding of alternative, 
channel-based recordings, etc. 


Intensity-based coincident stereophonic recording such as XY uses two figure-of- 
eight microphones, after Blumlein’s original work [1] from the 1930s, with an angular 
spacing of 90°, see [2—4]). Another representative, MS, uses an omnidirectional and 
a lateral figure-of-eight microphone [2]. Both typically yield a stable directional 
playback in stereo, but signals often get too correlated, yielding a lack in depth 
and diffuseness of the recording space when played back [5, 6] and compared to 
delay-based AB stereophony or equivalence-based alternatives. 

Gerzon’s work in the 1970s [7] gave us what we call first-order Ambisonic record- 
ing and playback technology today. Ambisonics preserves the directional mapping 


© The Author(s) 2019 1 
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by recording and reproducing with spatially undistorted omnidirectional and figure- 
of-eight patterns on circularly (2D) or spherically (3D) surrounding loudspeaker 
layouts. 


1.1 Blumlein Pair: XY Recording and Playback 


The XY technique dates back to Blumlein’s patent from the 1930s [1] and his patents 
thereafter [4]. Nowadays outdated, manufacturers started producing ribbon micro- 
phones that offered means to record with figure-of-eight pickup patterns. 


Blumlein Pair using 90°-angled figure-of-eight microphones (XY). Blumlein’s 
classic coincident microphone pair [3, Fig. 3] uses two figure-of-eight microphones 
pointing to +45°, see Fig. 1.1. Its directional pickup pattern is described by cos @ 
when ¢ is the angle enclosed by microphone aiming and sound source. Using a mathe- 
matically positive coordinate definition for X (front-right) and Y (front-left), the polar 
angle » = 0 aiming at the front, the figure-of-eight X uses the angle ¢ = g + 45° 
and Y the angle ¢ = g — 45°, so that the pickup pattern of the microphone pair is: 
exv(9) = e t a] | (1.1) 


Assuming a signal s coming from the angle g, the signals recorded are [X, Y]'g() s. 
Sound sources from the left 45°, the front 0° and the right —45° will be received by 
the pair of gains: 


(a) Blumlein XY pair (b) Picture of the recording setup 


Fig. 1.1 Blumlein pair consisting of 90°-angled figure-of-eight microphones 
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al- 
right: gxy(—45°) = fol A center: gxy(0°) = ri ; left: Bxy (45°) = H i 
v2 


Obviously, a source moving from the right —45° to the left 45° pans the signal from 
the channel X to the channel Y. This property provides a strongly perceivable later- 
alization of lateral sources when feeding the left and right channel of a stereophonic 
loudspeaker pair by Y and X, respectively. 

However, ideally there should not be any dominant sounds arriving from the 
sides, as for the source angles between —135° < p < —45° and 45° < @ < 135° the 
Blumlein pair produces out-of-phase signals between X and Y. The back directions 
are mapped with consistent sign again, however, left-right reversed. It is only possible 
to avoid this by decreasing the angle between the microphone pair, which, however, 
would make the stereo image narrower. 

Therefore, coincident XY recording pairs nowadays most often use cardioid direc- 
tivities + + 1 cos ọ, instead. They receive all directions without sign change and eas- 
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ily permit stereo width adjustments by varying the angle between the microphones. 


1.2 MS Recording and Playback 


Blumlein’s patent [1] considers sum and difference signals between a pair of chan- 
nels/microphones, yielding M-S stereophony. In M-S [8], the sum signal represents 
the mid (omnidirectional, sometimes cardioid-directional to front) and the differ- 
ence the side signal (figure-of-eight). MS recordings can also be taken with cardioid 
microphones and permit manipulation of the stereo width of the recording. 


MS recording by omnidirectional and figure-of-eight microphone (native MS). 
Mid-side recording can be done by using a pair of coincident microphones with 
an omnidirectional (mid, W) and a side-ways oriented figure-of-eight (side, Y) 
directivity, Fig. 1.2. The pair of pickup patterns is described by the vector: 


AL 


(a) Native MS recording (b) Picture of the recording setup 


Fig. 1.2 Native mid-side recording with the coincident arrangement of an omnidirectional micro- 
phone heading front and a figure-of-eight microphone heading left 
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1 
gwy (o) = es (1.2) 


that depends on the angle g of the sound source. Equation (1.2) maps a single sound 
s from ¢ to the mid W and side Y signals by the gains [W, Y]! = g(ọ) s 


left: gwy (90°) = HE right: gwy(—90°) = [i] center: gwy (0°) = p 


MS recording with a pair of 180°-angled cardioids. Two coincident cardioid micro- 
phones (cardioid directivity 5 + 5 cos o) pointing to the polar angles 90° (left) and 
—90° (right) are also applicable to mid-side recording, Fig. 1.3. Their pickup patterns 


_ 1[1+cos(g@—90°)| — 1 /1+sin(g) 
cto (P) — 2 f ER cos(~ + 90°) = 2 j= sin(g) (1.3) 
are encoded into the MS pickup patterns (W,Y) by a matrix 
1 1 


The matrix eliminates the cardioids’ figure-of-eight characteristics by their sum sig- 
nal, and their omnidirectional characteristics by the difference. We obtain the MS 
signal pair (W,Y) from the cardioid microphone signals as 


w] fi 117 Coo 
H-E] i 


(a) 180°angled cardioid microphones (b) Picture of the recording setup 


Fig. 1.3 Mid-side recording by 180°-angled cardioids 
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2-—a 


w—&— w wW — 1 


pef Y Or 


(a) Changing the MS stereo width (b) MS decoding to loudspeakers 


Fig. 1.4 Change of the stereo width by modifying the balance between W and Y signals of MS 
(left). Decoding of the M/S signal pair (W, Y) to a stereo loudspeaker pair (right) 


Decoding of MS signals to a stereo loudspeaker pair. Decoding of the mid-side 
signal pair to left and right loudspeaker is done by feeding both signals to both 
loudspeakers, however out-of-phase for the side signal, Fig. 1.4b: 


L 1/1 1][w 
[=al -lr a9 
An interesting aspect about the 180°-angled cardioid microphone MS is that after 
inserting the X Y-to-MS encoder Eq. (1.5) into the decoder Eq. (1.6), a brief calcu- 
lation shows that matrices invert each other. In this case, the cardioid signals are 
directly fed to the loudspeakers [L, R] = [Coge, C_oo°]. 
Stereo width. Modifying the mid versus side signal balance before stereo playback, 


using a blending parameter a, allows to change the width of the stereo image from 
a = 0 (narrow) to a = 1 (full), Fig. 1.4a, see also [9]: 


L 1{1 1]/2-a@0]|w 
[lsali -Eo all>], an 
In stereophonic MS playback, the playback loudspeaker directions at +30° are not 
identical to the peaks of the recording pickup pattern of the side channel (Y) at 90°. 


Ambisonics assumes a more strict correspondence between directional patterns of 
recording and patterns mapped on the playback system. 


1.3 First-Order Ambisonics (FOA) 


After Cooper and Shiga [10] worked on expressing panning strategies for arbitrary 
surround loudspeaker setups in terms of a directional Fourier series, the notion 
and technology of Ambisonics was developed by Felgett [11], Gerzon [7], and 
Craven [12]. In particular, they were also considering a suitable recording tech- 
nology. 

Essentially based on similar considerations as MS, one can define first-order 
Ambisonic recording. For 2D recordings, a Double-MS microphone arrangement is 
suitable and only requires one more microphone than MS recording: a front-back 
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oriented figure-of-eight microphone. The scheme is extended to 3D first-order 
Ambisonics by a third figure-of-eight microphone of up-down aiming. Oftentimes, 
first-order Ambisonics still is the basis of nowadays’ virtual reality applications and 
360° audio streams on the internet. In addition to potential loudspeaker playback, 
it permits interactive playback on head-tracked headphones to render the acoustic 
sound scene static to the listener. 

First-order Ambisonic recording has the advantage that it can be done with 
only a few high-quality microphones. However, the sole distribution of first-order 
Ambisonic recordings to playback loudspeakers is typically not convincing without 
going to higher orders and directional enhancements (Sect. 5.8). 


1.3.1 2D First-Order Ambisonic Recording and Playback 


The first-order Ambisonic format in 2D consists of one signal corresponding to an 
omnidirectional pickup pattern (called W), and two signals corresponding to the 
figure-of-eight pickup patterns aligned with the Cartesian axes (X and Y). 


Native 2D Ambisonic recording (Double-MS). To record the Ambisonic channels 
W, X, Y, one can use a Double-MS arrangement as shown in Fig. 1.5. 


2D Ambisonic recording with four 90° -angled cardioids. Extending the MS scheme 
for recording with cardioid microphones, Fig. 1.3, cardioid microphones could be 
used to obtain the front-back and left-right figure-of-eight pickup patterns by corre- 
sponding pair-wise differences, and one omnidirectional pattern as their sum, Fig. 1.6. 
However, the use of 4 microphones for only 3 output signals is inefficient. 


(a) Native 2D FOA recording (b) Picture of recording setup 


Fig. 1.5 Native 2D first-order Ambisonic recording with an omnidirectional and a figure-of-eight 
microphone heading front, and a figure-of-eight microphone heading left; photo shown on the right 
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(a) 2D FOA with 4 cardioid microphones (b) Picture of recording setup 


Fig. 1.6 2D first-order Ambisonic recording with four 90°-angled cardioid microphones, sums and 
differences between them (frontback, left-right) 


(a) 2D FOA with 3 cardioid microphones (b) Picture of recording setup 


Fig. 1.7 2D first-order Ambisonics with three 120°-angled cardioid microphones 


2D Ambisonic recording with three 120°-angled cardioids. Assuming 3 coinci- 
dent cardioid microphones aiming at the angles 0°, +120° in the horizontal plane, 
cf. Fig. 1.7, we obtain as the pickup pattern for the incoming sound 


cos(P) 
cos(g + 120°) 
cos(g — 120°) 


g@)==-—+ 


Nile 
Nie 


Combining all the three microphone signals yields an omnidirectional pickup 
pattern as aa cos(g + 22k) = 0. Moreover introducing the differences between 
the front and two back microphone signals and between the left and right microphone 
signals yields an encoding matrix to obtain the omnidirectional W and the two X and 
Y figure-of-eight characteristcs 


1 1 1 1 
3 2 —1 -1 | g(~) =| cos(g)]. (1.8) 
0 v3 -v3 sin(g) 
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Fig. 1.8 2D first-order 
Ambisonic decoding to 4 
loudspeakers 


2D Ambisonic decoding to loudspeakers. The W, X, and Y channel of 2D first- 
order Ambisonics (Double-MS) can easily be played on an arrangement of four 
loudspeakers, front, back, left, right. While the omnidirectional signal contribution 
is played by all of the loudspeakers, the figure-of-eight contributions are played out- 
of-phase by the corresponding front-back or left-right pair of loudspeakers, Fig. 1.8. 


F 1 10 
W 
L 1 0 1 
B| 7 iat 0 A ue) 
R 1 Ot 


The decoding weights obviously discretizes the directional pickup characteristics of 
the Ambisonic channels at the directions of the loudspeaker layout. Consequently, if 
the loudspeaker layout is more arbitrary and described by the set of its angles {g;}, 
the sampling decoder can be given as 


So, i 1 cos(¢ 1) sin(¢1) w 
ae oe E : : X |. (1.10) 
. 2 Å . . . Y 
Si 1 cos(gL) sin(gr) 
To achieve a panning-invariant and balanced mapping by this decoder, loudspeakers 
should be evenly arranged. Moreover, it can be favorable to sharpen the spatial image 
by attenuating W by Wa to map a sound by a supercardioid playback pattern. 


Playback to head-tracked headphones and interactive rotation. In headphone play- 
back, the headphone signals are generated by convolution with the head-related 
impulses responses of all four loudspeakers contributing to the left and the right ear 
signals 


[AE Ek hO (t) h Ok AL” Ox 


Lear 
Fa ~ KE (t)* hy (t)* in (t)* Ape a (1.11) 


awn sy 
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head tracker 


rotator 


Fig. 1.9 2D first-order Ambisonic decoding to head-tracked headphones 


To rotate the Ambisonic input scene of the decoder, it is sufficient to obtain a new 
set of figure-of-eight signals by mixing the X, Y channels with the following matrix 
depending on the rotation angle p, keeping W unaltered 


W 1 0 0 W 
X | = | 0 cosp — sin p X |. (1.12) 
Y O sinp cosp Y 


This effect is important for head-tracked headphone playback to render the VR/360° 
audio scene static around the listener. A complete playback system is shown in 
Fig. 1.9. The big advantage of such a system is that rotational updates can be done 
at high control rates and the HRIRs of the convolver are constant. 


1.3.2 3D First-Order Ambisonic Recording and Playback 


The first-order Ambisonic format in 3D consists of a signal W corresponding to an 
omnidirectional pickup pattern, and three signals (X, Y, and Z) corresponding to 
figure-of-eight pickup patterns aligned with the Cartesian coordinate axes. 

In three dimensions, we cannot work with figure-of-eight patterns described by 
sin g or cos g of the azimuth angle only, anymore. It is more convenient to describe 
the arbitrarily oriented figure-of-eight characteristics cos(@) using the inner product 
between a variable direction vector (direction of arriving sound) and a fixed direction 
vector (microphone direction). Direction vectors are of unit length ||@|| = 1 and their 
inner product corresponds to 010 = cos(@), where ¢ is the angle enclosed by the 
direction of arrival 6 and the microphone direction @;. Consequently, a cardioid 
pickup pattern aiming at 8; is described by 5 + 5010. 


Native 3D Ambisonic recording (Triple-MS). To record the Ambisonic channels 
W, X, Y, Z, one can use a Triple-MS scheme as shown in Fig. 1.10. With the trans- 
posed unit direction vectors representing the aiming of the figure-of-eight channels 
0x. = [1], 0, 0], Oy = [0, 1, 0], 0, = [0, 0, 1], to produce the direction dipoles 05.0, 
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(a) Native 3D FOA recording (b) Picture of the recording setup 


Fig. 1.10 Native 3D first-order Ambisonic recording with an omnidirectional and three figure-of- 
eight microphones aligned with the Cartesian axes X, Y, Z 


029, and 0390, we can mathematically describe the pickup patterns of native 3D first- 
order Ambisonics as 


1 1 
0x. 100 1 
0, 001 


3D Ambisonic recording with a tetrahedral arrangement of cardioids. The principle 
that worked for three cardioid microphones on the horizon also works for a coincident 
tetrahedron microphone array of cardioids with the aiming directions FLU-FRD- 
BLD-BRU, see Fig. 1.11, and [12], 


qt 
FLU 1 1 1 
1 1 | Oep 1 11 1—1 —1 
D= -+ ye ee 0 1.14 
$= 5 ala. | 2 aye 1-4 as 
qr —] —1 1 
BRU 


Encoding is achieved there by the matrix that adds all microphone signals in the 
first line (W omnidirectional), subtracts back from front microphone signals in the 
second line (X figure-of-eight), subtracts right from left microphone signals in the 
third line (Y figure-of-eight), and subtracts down from up microphone signals in the 
last line (Z figure-of-eight), see also Fig. 1.11, 


1 1 1 1 
(6) == 1 1-1) gw (1.15) 
S8wxyZ ~5 3 1-1 1-1 8&0). : 
1-1-1 1 
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FRD 


BLD 


(a) Tetrahedral array with four cardioids (b) Encoder of microphone signals 


Fig. 1.11 Tetrahedral arrangement of cardioid microphones for 3D first-order Ambisonics and 
their encoding; microphone capsules point inwards to minimize their spacing 


i í 
(a) Tetrahedral array of 4 cardioids (b) SPS200 (c) MK4012 (d) ST450 


Fig. 1.12 Practical tetrahedral recording setups with cardioid microphones; the Soundfield SPS200, 
Oktava MK4012, and Soundfield ST450 offer a fixed 4-capsule geometry. Equally important: 
Zoom’s H3-VR, Røde’s NT-SF1, Sennheiser’s AMBEO VR Mic 


As Fig. 1.12 shows, practical microphone layouts should be as closely spaced as 
possible. Nevertheless for high frequencies, the microphones cannot be considered 
coincident anymore, and besides a directional error, there will be a loss of presence in 
the diffuse field. Typically a shelving filter is used to slightly boost high frequencies. 
Roughly, a high-shelf filter with a 3 dB boost is sufficient to correct timbral defects 
at frequencies above which the microphone spacing exceeds half a wavelength, e.g., 
5 kHz for a 3.4 cm spacing of the microphones. More advanced strategies are found, 
e.g., in [7, 13-15]. 


3D Ambisonic decoding to loudspeakers. As before in the 2D case, a sampling 
decoder can be defined that represents the continuous directivity patterns associated 
with the channels W, X, Y, Z to map the signals to the discrete directions of the 
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loudspeakers. Given the set of loudspeaker directions {0;} and the unit-vectors to X, 
Y, Z, the loudspeaker signals of the sampling decoder become 


Sı 1 010x 0; Oy 0707] | W 107] |W 
Soal. x| 1j. | he iie 
7 a) yl 45]: Y (1.16) 
SL 1 01 0x 0: 0y 600z] | z 16} iz 

D 


Equivalent panning function/virtual microphone. The sampling decoder together 
with the native Ambisonic directivity patterns g\)yy7(0) = [1, 6°] yields the map- 
ping of a signal s from the direction @ to the loudspeakers to be 


Sı 107 1+ 010 
. og WET 
ae) 


This result means that the gain of a source from @ at each loudspeaker 0; corresponds 
to evaluating a cardioid pattern aligned with 8. Consequently, the Ambisonic mapping 
corresponds to a signal distribution to the loudspeakers using weights obtained by 
discretization of an Ambisonics-equivalent first-order panning function. 

Equivalently, Ambisonic playback using a sampling decoder is comparable to 
recording each loudspeaker signal with a virtual first-order cardioid microphone 
aligned with the loudspeaker’s direction 0). 

It is decisive for a panning-independent loudness mapping and balanced perfor- 
mance that the directions of the loudspeaker layout are well chosen. Also, it can be 
preferred to reduce the level of the omnidirectional channel W by A to map a sound 
by the narrower supercardioid playback pattern instead of a cardioid pattern, which 
is rather broad. 

Decoder design problems were early addressed by Gerzon [16], Malham [17], 
and Daniel [18]. A current solution for higher-order decoding is given in Sect. 4.9.6 
on All-round Ambisonic decoding. 


1 
: sho s, (1.17) 
SL 107 1 +010 


3D Ambisonic decoding to headphones. 3D Ambisonic decoding to headphones 
uses the same approach as for 2D above, except that additional rotational degrees are 
implemented to compensate for any change in head orientation. Rotation concerns 
the three directional components X, Y, Z 


X X 
Ý | = R@, pb, y) | Y |. (1.18) 
Z Z 


For the definition of the rotation matrix R(a, 6, y) and the meaning of its angles 
refer to Eq. 5.5 of Sect. 5.2.2. The selection of a suitable set of HRIRs is a question 
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of directional discretization of the 3D directions, as addressed in the decoder above. 
Signals obtained for virtual loudspeakers are again to be convolved with the corre- 
sponding HRIRs for the left and the right ear. 


1.4 Practical Free-Software Examples 


The practical examples below show first-order Ambisonic panning a mono sound, 
decoded to simple loudspeaker layouts. These are either a square layout with 4 
loudspeakers at the azimuth angles [0°, 90°, 180°, —90°] or an octahedral lay- 
out with 6 loudspeakers at azimuth [0°, 90°, 180°, —90°, 0°, 0°] and elevation 
[0°, 0°, 0°, 0°, 90°, —90°]. 


1.4.1 Pd with Iemmatrix, Iemlib, and Zexy 


Pd is free and it can load and install its extensions from the internet. Required software 
components are: 


pure-data (free, http://puredata.info/downloads/pure-data) 
iemmatrix (free download within pure-data) 

zexy (free download within pure-data) 

iemlib (free download within pure-data) 


Figure 1.13 gives an example for horizontal (2D) first-order Ambisonic panning, 
decoded to 4 loudspeaker and 2 headphone signals. 

Figure 1.14 shows the processing inside the Pd abstraction 
[FOA_binaural_decoder] contained in the Fig.1.13 example, which uses SADIE 
database! subject 1 (KU100 dummy head) HRIRs to render headphone signals. 

Figure 1.15 sketches a first-order Ambisonic panning in 3D with decoding to 
an octahedral loudspeaker layout; master level [multiline~] and hardware outlets 
[dac~] were omitted for easier readability. 


1.4.2 Ambix VST Plugins 


This example uses a DAW and ready-to-use VST plug-ins to render first-order 
Ambisonics. As DAW, we recommend Reaper (reaper.fm) because it nicely facili- 
tates higher-order Ambisonics by allowing tracks of up to 64 channels. Moreover, it is 
relatively low-priced and there is a fully functional free evaluation version available. 
You can also use any other DAW that supports VST and sufficiently many multi-track 


"https://www.york.ac.uk/sadie-project/Resources/S ADIEIIDatabase/D 1/D1_HRIR_WAV.zip. 
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encoding vector 
I 

0 azimuth angle in deg. 

decoding matrix [zexy/deg2rad 


declare -lib zexy 


expr 1.0/2; 
cos ($f1)/2; declare -lib iemmatrix 


sin($f1)/2 
pack f £ f 


<-test signal 


master 
level 


Fig. 1.13 First-order 2D encoding and decoding in pure data (Pd) for a square layout 


Front Left Rear Right loadbang 
inlet~]linlet~}inlet~]inlet~ 


AL. Man 
FIR h2XOL 2Q6 |FIR~ h270R 256 
FAR~ hMOL 6 |FÁR~ h180R 256 | h90L h90R h270L h270R 
EYR~ h90L 256 | |F¥R~ h90R 256 ah j- 
FIR~ hoL 256 FUR~ hOR 256 
h180L h180R 
outlet~ outlet~ A 


Fig. 1.14 Binaural rendering to headphones on pure data (Pd) by convolution with SADIE KU100 
HRIRs 


channels. The example employs the freely available ambiX plug-in suite (http://www. 
matthiaskronlachner.com/?p=2015), although there exist other Ambisonics plug-ins, 
especially for first-order. 


Track Name Ins Outs FX 
Virtual source 1 1 4 ambix_encoder_ol 
MASTER 4 6 ambix_decoder_ol 


1.4 Practical Free-Software Examples 15 


declare -lib zexy 


Azimuth angle in deg. 


I declare -lib iemmatrix 


static decoding matrix |declare -lib iemlib 


osc~ 1000 


mtx_*~ 4 1 10 


loadbang 


<- sine 
test signal 


matrix 6 4 


<- encodes 


signal into 4 
signals of an 
Ambisonics bus 


decodes to 
octahedron 
loudspeaker 
<- signals 


t 


_transpose 


adjustable encoding vector -12 


encodes test signal by the factors: oa 

W: 1, -50 

-Ysqrt(3): -sqrt(3)sin(azi)cos(ele), soa . 

zsqrt (3): sqrt(3)sin(ele), Front Left Rear Right Bottom 
Xsqrt(3): sqrt(3)cos(azi)cos(ele), 


Fig. 1.15 First-order 3D encoding and decoding in pure data (Pd) using 
[mtx_spherical_harmonics] for an octahedral layout ([dac~] omitted for simplicity) 


After creating the new track for the virtual source and importing a mono/stereo 
audio file (per drag-and-drop), the next step is the setup of the track channels. As 
shown in the table, the virtual source has a single-channel (mono) input and 4 output 
channels to send the 4 channels of first-order Ambisonics to the Master. The option 
to send to the Master is activated by default, cf. left in Fig. 1.16. The Master track 
itself requires 4 input channels and 6 output channels to feed the 6 loudspeakers 
(right). In Reaper, there is no separate adjustment for input and output channels, thus 
the Master track has to be set to 6 channels. 

In the source track FX, the ambix_encoder_o1 can be used to encode the virtual 
source signal at an arbitrary location on a sphere by inserting the plug-in into the 
track of the virtual source, cf. its panning GUI in Fig. 1.17. For adding more sources, 
the track of the virtual source can simply be copied or duplicated. All effects and 
routing options are maintained for the new tracks. 

In order to decode the 4 first-order Ambisonics Master channels to the loudspeak- 
ers the ambix_decoder_o1 plug-in is added to the Master track. The plug-in requires a 
preset that defines the decoding matrix and its channel sequence and normalization. 
For the exemplary octahedral setup with 6 loudspeakers, the following text can be 
copied to a text file and saved as config-file, e.g., “octahedral. config”. The decoder 


16 


Routing for track 1 "virtual source 1 


1 Master send Parent channels, 14 ~ * MIDI Hardware Output 
(+0.00 |e Track channel: 4 wae nd 

T Send to original channels 

an W 

Pani |certer | Width: |100% | g - Receives 

S Add new receive. 

Add new send 
+ Audo Hardware Outputs 

Add new hardware output 


Fig. 1.16 1st-order example in Reaper DAW: routing 
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Outputs for Master Track E 
a [000 Jae Track channels: 6 
ar) g 
Pan: certer | Width | 100° | g 
Audo Hardware Outputs 
Add new hardware output v 
Hardware: Charnel 1 / Channel 6 Delete | 
(2000 Jfeereer foa J 


ai) i 
16 iv] + Carne t/ha E | 


Fig. 1.17 1st-order example 
in Reaper DAW: encoder 


VST: ambix_encoder_ol (kroniachner) (1->4ch) - Track... | | 


No preset v + Param lin 4out ui (oY 


AMBIX-ENCODER 


matrix contains W, -Y, Z, X, with W as constant and -Y, Z, X refer to Cartesian 


coordinates of the octahedron. 


#GLOBAL 
/coeff_scale n3d 
/coeff_seq acn 


#END 
#DECODERMATRIX 
16001 

1 1 0 0 
1-1 0 0 

1 0 0-1 

1 01 0 

1 0-1 0 
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VST: ambix_decoder_o1 (kronlachner) (4->48ch) - Track 1 "1-st order example” [4/4] E 
M 


No preset vi i+ |Paem 4inG/48a0t Ul (, 


AMBIX-DECODER 


playback Ambisonics with loudspeakers 


dese octahedral | open | 


preset folder 


Ambisonics input channels: = 


Loudspeakers: 6 


speaker # 6 with 4 coefficients 

speaker # 5 with 4 coefficients 

speaker # 4 with 4 coefficients 

speaker # 3 with 4 coefficients 

speaker # 2 with 4 coefficients 

speaker # 1 with 4 coefficients 

coefficient sequence: 0 

coefficient scaling: n3d (getting rescaled to fit sn3d) 
Samplerate: 48000 Host Buffer Size: 1024 


Volume [dB] 


Fig. 1.18 1st-order example in Reaper DAW: decoder 


After loading the preset into the decoder plug-in, the decoder can generate the 
loudspeaker signals as shown in Fig. 1.18. In the example, the virtual source is panned 
to the front, resulting in the highest level for loudspeaker 1 (front). The loudspeaker 
3 (back) is 12dB quieter because of a side-lobe suppressing super cardioid weighting 
implied by the switch /coeff_scale n3d, as a trick to keep things simple. 

As shown on the SADIE-II website,” the SADIE-II head-related impulse responses 
can be used to rendering Ambisonics to headphones. The listing below shows a con- 
figuration file to be used with ambix_binaural, cf. Fig. 1.19, again using the trick to 
select n3d to keep the numbers simple and super-cardioid weighting 


#GLOBAL 
/coeff_scale n3d 
/coeff_seq acn 
#END 


#HRTF 
44K_16bit/azi_0,0_ele_0,0.wav 
44K_16bit/azi_90,0_ele_0,0.wav 
44K_16bit/azi_180,0_ele_0,0.wav 
44K_16bit/azi_270,0_ele_0,0.wav 
44K_16bit/azi_0,0_ele_90,0.wav 
44K_16bit/azi_0,0_ele_-90,0.wav 
#END 


*https://www.york.ac.uk/sadie-project/ambidec.html. 
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VST: ambix_binaural_o1 (kronlachner) - Master Track [2/2] | | 


No preset v + Paa 4in20t Ul QM 


AMBIX-BINAURAL-DECODER 


istening to Ambisonics wit 


esas OCtahedronFOA-binaural | open | 
A folder 
Ambisonics input channels: 4 BAES 


Virtual loudspeakers: 6 &% load IRs 


Impulse responses: 6 64 èl 


speaker # 6 with 4 coefficients 

speaker # 5 with 4 coefficients 

speaker # 4 with 4 coefficients 

speaker # 3 with 4 coefficients 

speaker # 2 with 4 coefficients 

speaker # 1 with 4 coefficients 

add conv # 6 44K_16bit/azi_0,0_ele_-90,0.wav gain: 1 delay: 
O swa 

add conv # 5 44K_16bit/azi_0,0 ele 90,0.wav gain: 1 delay: 


Fig. 1.19 1st-order example in Reaper DAW: binaural decoder 


#DECODERMATRIX 
16001 

1 1 0 0 
1-1 0 0 

1 0 0-1 

1 01 0 

1 0-1 0 
#END 


For decoding to less regular loudspeaker layouts, the IEM A11RADecoder* permits 
editing loudspeaker coordinates and automatically calculating a decoder within the 
plugin. For decoding to headphones, the IEM BinauralDecoder offers a high-quality 
decoder. The technology behind both plugins is explained in Chap. 4. 

In addition to the virtual sources, you can also add a 4-channel recording done 
with a B-format microphone by placing the 4-channel file in a new track. Reaper 
will automatically set the number of track channels to 4 and send the channels to the 
Master. Note that some B-format microphones use a different order and/or weighting 
of the Ambisonics channels. Simple conversion to the AmbiX-format can be done 
by inserting the ambix_converter_ol plug-in into the microphone track. 


1.5 Motivation of Higher-Order Ambisonics 


Diffuseness, spaciousness, depth? Diffuse sound fields are typically characterized 
by sound arriving randomly from evenly distributed directions at evenly distributed 
delays. It is practical knowledge that the impression of diffuseness and spaciousness 


3https://plugins.iem.at/. 
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requires benefits from decorrelated signals, which is typically achieved by large 
distances between the microphones rather than by coincident microphones. 

Due to the evenness of diffuse sound fields, one would still hope that a low spatial 
resolution is sufficient to map diffuseness and spatial depth of a room, using coin- 
cident microphones or first-order Ambisonics. Nevertheless, high directional corre- 
lation during playback destroys this hope and in fact yields a perceptually impeded 
playback of diffuseness, spaciousness, and depth. 

The technical advantages in interactivity and VR as well as the known short- 
comings of first-order coincident recording techniques offer enough motivation to 
increase the directional resolution and go to higher-order Ambisonics, as presented 
in the subsequent chapters. For professional productions, it is often not sufficient to 
only rely on first-order coincident microphone recordings. By contrast, higher-order 
Ambisonics is able to drastically improve the mapping of diffuseness, spaciousness, 
and depth, as shown in the upcoming chapter about psychoacoustical properties of 
many-loudspeaker systems. 

Recording with a higher-order main microphone array increases the required 
technological complexity. Nevertheless, digital signal processing and the theory pre- 
sented in the later chapters is powerful nowadays to achieve this goal. 

After all, it seems that delay-based stereophonic recording, such as AB, or 
equivalence-based recording, such as ORTF, INAS, etc., is often required and well- 
known in its mapping properties for spaciousness and diffuseness, correspondingly. 
What is nice about higher-order Ambisonics: it can make use of these benefits by 
embedding such recordings appropriately, see Fig. 1.20. 


Facts about higher orders: Ambisonics extended to higher orders permits a refine- 
ment of the directional resolution and hereby improves the mapping of uncorrelated 
sounds in playback. Figure 1.21a shows the correlation introduced in two neighbor- 
ing loudspeaker signals when using Ambisonics, given their spacing of 60°. Given 
the just noticeable difference (JND) of the inter-aural cross correlation, the figure 
indicates that an Ambisonic order of >3 might be necessary to perceptually preserve 


decorrelation. 


th 


Fig. 1.20 How is a microphone tree represented in Ambisonics, when it consist of 6 cardioids 
spaced by 60cm and 60° on a horizontal ring, and a ring of 4 super cardioids spaced by 40cm and 
90° as height layer, pointing upwards? 
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Fig. 1.21 Relation between the Ambisonic order, decorrelation, and perceived depth 


Fig. 1.22 Perceptual sweet 5 — r y r 
area of Ambisonic playback + t a 2 + 
from the front at first (light 4f 
gray), third (dark gray), and 3l E 
fifth order (black). It marks =a 
the area in which the 2f 
perceived direction is 1} 
plausible, i.e., does not E 
collapse into a single S 0 
loudspeaker other than C An 1t 
-2 $ 
-3 F 
-4 A 
5 Ws a a i i 
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For this reason, the perception of spatial depth strongly improves when increasing 
the Ambisonic order from 1 up to 3, Fig. 1.21b. However, this is only the case when 
seated at the central listening position. Outside this sweet spot, higher orders than 
3, e.g., 5, additionally improve the mapping of depth [19]. Therefore, higher-order 
Ambisonics is important for preserving spatial impressions and when supplying a 
large audience. 

Figure 1.22 shows that the sweet area of perceptually plausible playback increases 
with the Ambisonic order [20]. With fifth-order Ambisonics, nearly all the area 
spanned by the horizontal loudspeakers at the IEM CUBE, the 12 x 10 m concert 
space at our lab, becomes a valid listening area. 
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Chapter 2 A) 
Auditory Events of Multi-loudspeaker creek 
Playback 


It is evident that until one knows what information needs to be 
presented at the listener’s ears, no rational system design can 
proceed. 


Michael A. Gerzon 1976 and AES Vienna [1] 1992. 


Abstract This chapter describes the perceptual properties of auditory events, the 
sound images that we localize in terms of direction and width, when distributing a 
signal with different amplitudes to one or a couple of loudspeakers. These amplitude 
differences are what methods for amplitude panning implement, and they are also 
what mapping of any coincident-microphone recording implies when reproduced 
over the directions of a loudspeaker layout. Therefore several listening experiments 
on localization are described and analyzed that are essential to understand and model 
the psychoacoustical properties of amplitude panning on multiple loudspeakers of 
a 3D audio system. For delay-based recordings or diffuse sounds, there is some 
relation, however, it is found to be less stable for the desired applications. Moreover, 
amplitude panning is not only about consistent directional localization. Loudness, 
spectrum, temporal structure, or the perceived width should be panning-invariant. 
The chapter also shows experiments and models required to understand and provide 
those panning-invariant aspects, especially for moving sounds. It concludes with 
openly-available response data of most of the presented listening experiments. 


Starting from classic listening experiments on stereo panning by Leakey [2], Wendt 
[3], and pairwise horizontal panning by Theile [4], this chapter explores the relevant 
perceptual properties for 3D amplitude panning and their models. Important exper- 
imental studies considered here are for instance those by Simon [5], Kimura [6], 
F. Wendt [7], Lee [8], Helm [9], and Frank [10, 11]. By the experimental results, it 
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is possible to firmly establish Gerzon’s [1] E, rg and ||rg|| estimators for perceived 
loudness, direction, and width that apply to most stationary sounds in typical studio 
and performance environments. 


2.1 Loudness 


Atameasurement point in the free field, the same signal fed to equalized loudspeakers 
of exactly the same acoustic distance would superimpose constructively (+6 dB). 

In a room with early reflections and a less strict equality of the incoming pair of 
sounds (typical, slight inaccuracy in loudspeaker/listener position, different mount- 
ing situations, different directions in the directivities of ears and loudspeakers), the 
superposition can be regarded as stochastically constructive (+3 dB) in particular at 
frequencies that aren’t very low. 

For the above reasoning, typical amplitude panning rules try to keep the weights 
distributing the signal to the loudspeakers normalized by root of squares instead of 
normalizing to the linear sum, in order to obtain constant loudness ([12], VBAP): 


8&1 


= : 
y Ži 8 


Loudness Model. If all loudspeakers are equalized, located at the same distance to the 
listener, and fed by the same signal with different amplitude gains g;, a constructive 
interference could be expected so that the amplitude becomes [1] 


& <— 


(2.1) 


L 
P= a1 (2.2) 
{=l 


However, the interference stops to be strictly constructive as soon as the room is not 
entirely anechoic, the sitting position is not exactly centered, or even for anechoic and 
centered conditions at high frequencies, when the superposition at the ears cannot 
be assumed to be purely constructive anymore. Then it is better to assume a less 
well-defined, stochastic superposition in which a squared amplitude is determined 
by the sum of the squared weights [1]: 


ON (2.3) 


Therefore, the most common amplitude panning rules use root-squares normalization 
to obtain a loudness impression that is as constant as possible. 

The measure E seems to be most useful when designing and evaluating amplitude- 
panning or coincident microphone techniques. It is not surprising that the ITU-R 
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BS.1770-4! uses the Leq(RLB) measure as a loudness model: it is essentially the 
RMS level after high-pass filtering, cf. [13], which is closely related to the E measure 
detected from loudspeaker signals. 

An interesting refinement was proposed by Laitinen et al. [14], which uses a 


measure ,/ Yay g? in which the exponent p is close to 1 at low frequencies under 
anechoic conditions and close to 2 at high frequencies/under reverberant conditions. 


2.2 Direction 


In the early years of stereophony, researchers investigated the differences in delay 
times and amplitudes required to control the perceived direction. Below, only exper- 
iments are considered that did not use fixation of the listener’s head. 


2.2.1 Time Differences on Frontal, Horizontal Loudspeaker 
Pair 


The dissertation of K. Wendt in 1963 [3] shows notably accurate listening experi- 
ments done on +30° two-channel stereophony using time delays, in which listeners 
indicated from where they heard the sounds for each of the tested time differences. 
H. Lee revisited the properties in 2013 [8], but with musical sound material and an 
experiment, in which the listener adjusted the time differences until the perceived 
direction matched the one of a corresponding fixed reference loudspeaker, Fig. 2.1. 
The time differences are seldom applicable to reliable angular auditory event 
placement: auditory images are strongly frequency-dependent (not shown here) and 
therefore unstable for narrow-band sounds. Leakey and Cherry showed 1957 [2] that 
time-delay stereophony loses its effect under the presence of background noise. 


2.2.2 Level Differences on Frontal, Horizontal Loudspeaker 
Pair 


K. Wendt’s [3] and H. Lee’s [8] experiments deliver insights in sound source position- 
ing with +30° two-channel stereophony, however this time with level differences. 

As opposed to Fig.2.1, in which auditory image panning with time differences 
were characterized by statistical spreads of up to 15°, level-difference-based panning 
is clearly smaller in the spread of perceived directions than 10°, Fig. 2.2. 


Signal dependency. Wendt [3] described the signal dependency of panning curves 
on various transient and band-limited sounds, and Lee [8] for musical sounds. A new 


"https://www.itu.int/rec/R-REC-BS.1770-4-201510-I/en Algorithms to measure audio pro- 
gramme loudness and true-peak audio level (10/2015). 
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Fig. 2.1 K. Wendt’s experiment [3] used an angular marks helping to specify the localized direction 
(left). Right shows results for time differences between impulse signals fed to loudspeakers, no head 
fixation (diagram shows means and standard deviation; the standard deviation was interpolated for 
the figure). In gray: Results of the time-difference adjustment experiment of Lee [8] using musical 
material (25, 50, 75% quartiles, symmetrized diagram) 
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Fig. 2.2 Wendt’s [3] results to crack (impulsive) signals with level differences and without head 
fixation (the figure shows means and standard deviation; standard deviation was interpolated to 
plot this figure). In gray: Results of Lee’s [8] level-difference adjustment experiment with musical 
sounds (25, 50, 75% quartiles, symmetrized diagram) 


comprehensive investigation on frequency dependency was carried out by Helm and 
Kurz [9]. With level differences {0, 3, 6, 9, 12} dB and third-octave filtered pulsed 
pink noise at {125, 250, 500, 1k, 2k, 4k} Hz, they showed that the perceived angle 
pointed at by the listeners using a motion-tracked pointer was similar between the 
broad-band case and third-octave bands below 2 kHz. In bands above 2 kHz, smaller 
level differences cause a larger lateralization, see interpolated curves in Fig. 2.3. 
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Fig. 2.3 Panning curve for frontal +30° loudspeaker pair from [9] on the example of the 500 and 
4kHz third-octave band and the slopes for different bands, based on the 3 and 6dB conditions 


2.2.3 Level Differences on Horizontally Surrounding Pairs 


Successive pairwise panning on neighboring loudspeaker pairs is typically used to 
pan auditory events freely along the loudspeakers of a horizontally surrounding loud- 
speaker ring. The classical research done specifically targeted at such applications 
was contributed by Theile and Plenge 1977 [4]. They used a mobile reference loud- 
speaker with some reference sound that could be moved to match the perceived 
direction of a loudspeaker pair playing pink noise with level differences at differ- 
ent orientations with respect to the listener’s head. There is also the experiment of 
Pulkki [15] using a level-adjustment task, in which levels were adjusted as to match 
the auditory event to one of a reference loudspeaker at three different reference 
directions and for different head orientations. A comprehensive experiment was done 
by Simon et al. [5], who used a graphical user interface displaying the floor plan of 
a 45°-spaced loudspeaker ring to have the listeners specify the perceived direction. 
Martin et al. in 1999 [16] used a graphical user interface showing the floorplan of 
a 5.1 ring in their experiment, and last but not least, Matthias Frank used a direct 
pointing method to enter the perceived direction [10] in one of his experiments. 

As the experiments did not seem to yield consistent results, a comprehensive level- 
difference adjustment experiment with 24 loudspeakers arranged as a horizontal ring 
was done in [17] and partially repeated later in [11], see results in Fig. 2.4. In the 
repeated experiment [11] it became clear that in the anechoic room, a large amount 
of the differently pronounced localization biases can be avoided by encouraging 
the listeners to do front-back and left-right head motion by a few of centimeters, 
whenever there is doubt. 
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Fig. 2.4 Medians and 95% confidence intervals for adjusted level differences to align amplitude- 
panned pink-noise with harmonic complex tone from {+15°, 0°}, for a frontal and b lateral 60° 
stereo pair; a uses data from [17] with 4 responses per direction from 5 listeners; b used data 
from [11] with 20 responses per direction. Despite the considerably different spread, frontal and 
lateral stereo pairs seem to yield pretty much the same tendency 


2.2.4 Level Differences on Frontal, Horizontal to Vertical 
Pairs 


Quite extensively, T. Kimura investigates the localization of auditory events between 
frontal, vertical +13.5° loudspeaker pairs in 2012 [6, 18]. The work of F. Wendt in 
2013 [7, 19] also investigates a slant and vertical loudspeaker pair, Fig.2.5. Kimura 
uses pulsed white noise, Wendt uses pulsed pink noise. 

Obviously, the horizontal spread is always smaller than the vertical spread and the 
spread does not align with the direction of the loudspeaker pair. The largest vertical 
spread appears for the vertical loudspeaker pair. 


2.2.5 Vector Models for Horizontal Loudspeaker Pairs 


A weighted sum of the loudspeakers’ direction vectors 01, 82 could be conceived 
as simple linear model of the perceived direction, using a linear blending parameter 
O0<q<l 


r=(1—q)9,+ 4%. (2.4) 
The parameter q adjusts where the resulting vector r is located on the connecting 


line between 0; and 02. On frontal loudspeaker pairs, localization curves typically 
run through the middle direction g = j for level differences of 0 dB. If only one 
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Fig. 2.5 Mean values and 95% confidence intervals of the direct-pointing experiments of Kimura 
(top) with level differences on a vertical +13.5° loudspeaker pair and results of F. Wendt (bottom) on 
frontally arranged horizontal, slant, and vertical +20° loudspeaker pairs showing two-dimensional 
95% confidence (solid) and standard deviation ellipses (dotted) 


loudspeakers is active, the result is either of the loudspeaker directions, thus the 
parameter is g = Oor q = 1. 


Classical definitions. As the simplest choice for q, one could insert q = —2 


81t+82 
q= wa to get the vector definitions as weighted average using either the linear or 
61752 
squared gains according to [1]: 
01+ 920 70, + 850 
yo rea eS 
git 8 81 + 8 


For both models, equal gains gı = g2 yield q = $, and also the endpoints with g2 = 0 
or g; = 0 correspond to q = 0 or q = 1, respectively. However, the slope of the rg 
vector is steeper than the one of the ry. For instance, if g2 = 2 g4, the vector ry lies 
on q = 2/3 of the line between 0, and 02, while rg lies at q = 4/5 of the connecting 
line. 

The ry vector for the +æ loudspeaker pair at the directions o, = (cosa, + sina) 
corresponds to the tangent law [20], whose formal origin lies in a model of summing 
localization based on a simple model of the ear signals, cf. Appendix A.7. The equiv- 
alence of this law to the vector model follows from the tangent tan ọ as ratio of the 


oe — gi sin(a)+g2 sin(—a@) _ gi—g 
y divided by x component of the ry vector, tan g = pti cel ate ANg, 
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(c) lateral 60° stereo pair, broadband (d) frontal horizontal and vertical +20° pair, broadband 
Fig. 2.6 Fit of the ry, rg, and r, models for a third-octave noise on a frontal stereo pair using 


data from [9], and with data from [11]: b pink noise frontal and ¢ lateral, cf. Figs.2.3 and 2.4; d 
horizontal and vertical from [7], Fig. 2.5 


Adjusted slope. Differently steep curves were fitted by an adjustable-slope model [17] 


__ Igil” 61+ Leet” © 
r lgil” + gal” 


(2.6) 


which uses y = 1 for ry and y = 2 for rg. Figure2.6 compares the prediction by 
ry, rg, and r, to frequency-dependently perceived directions in frontal horizontal 
pairs, to perceived directions in a lateral stereo pair, and to perceived directions in 
a frontal pair that is either horizontal or vertical, using various studies mentioned 
above. 


Practical choice rg. While a specific exponent y closely fitting the experimental 
data may vary, a constant value is preferable. Figure 2.6 indicates that in most cases 
focusing on rg is reasonable and sufficiently precise, see also [11]. 
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Fig. 2.7 Indirect level-adjustment experiment of Pulkki [21] shows the spread and mean of the 
adjusted VBAP angles for frontal loudspeaker triplets, and the experiments of F. Wendt [7, 19] 
use a direct pointing method to obtain results in the shape of two-dimensional 95% confidence 
(solid) and standard deviation ellipses (dotted) for {—oo, 0, +11.71} dB for the top loudspeaker 
(left diagram), or the right loudspeaker (center diagram) respectively, or {—oo, 0, +11.51} dB for 
the bottom loudspeaker (right) 


2.2.6 Level Differences on Frontal Loudspeaker Triangles 


V. Pulkki [21] and F. Wendt [7, 19] investigated localization properties for frontal 
loudspeaker triplets with level differences, see Fig. 2.7. Both used pulsed pink noise 
in their experiments. 

While V. Pulkki used an indirect adjustment task to evaluate VBAP control angles 
to obtain auditory events directionally matching the respective reference loudspeak- 
ers, F. Wendt uses a direct pointing method. Wendt’s experiments indicate that loud- 
speaker triplets with three different azimuthal positions yield a smaller spread in the 
indicated direction than such with vertical loudspeaker pairs (not the case in Pulkki’s 
experiments). 


2.2.7 Level Differences on Frontal Loudspeaker Rectangles 


F. Wendt [7, 19] moreover presents experiments about frontal loudspeaker rectangles, 
again using a pointer method and pulsed pink noise, Fig. 2.8. 

Again it seems that arrangements avoiding vertical loudspeaker pairs exhibit a 
smaller statistical spread in the responses. 
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Fig. 2.8 Wendt’s experiments about frontal loudspeaker rectangles showing two-dimensional 95% 
confidence (solid) and standard deviation ellipses (dotted). The experimental setup of this and 
above-mentioned experiments is shown. Left: each of the corner loudspeakers is raised once by 
+6 dB in level, right: both left/right loudspeaker levels are raised once by {+3, +6} dB, and both 
top/bottom pairs are once raised by +6 dB 


2.2.8 Vector Model for More than 2 Loudspeakers 


For more than two active loudspeakers and in 3D, a vector model based on the 
exponent y = 2 yields the rg vector [1] 


— Sia g? 9; 


= (2.7) 
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2.2.9 Vector Model for Off-Center Listening Positions 


At off-center listening positions, the distances to the loudspeakers are not equal any- 
more, resulting in additional attenuation and delay for each loudspeaker depending 
on the position. For stationary sounds, this effect can be incorporated into the energy 
vector by additional weights w,; and wz; 


2.2 Direction 33 


O SL We Wr 81)? O; 


rge = L 
baa (wr Wz,1 g)” 


(2.8) 


The weight w, , models the attenuation of a point-source-like propagation E, The 
reference distance is the distance to the closest loudspeaker at the evaluated listening 
position, thus the weight of each loudspeaker results in 


1 
Wr = —. (2.9) 
ri 


The incorporation of delays into the energy vector requires a transformation that 
yields the weights w,,; for each loudspeaker. It is reasonable that these weights atten- 
uate the lagging signals in order to reduce their influence on the predicted direction. 
An attenuation of 18 is known from the echo threshold in [22], similarly [23], 
and has successfully been applied for the prediction of localization in rooms [24]. 
The weight of each loudspeaker is calculated as t; = s in seconds at the listening 


position under test 


Wry = 1070", (2.10) 

Further weights can be applied in order to model the precedence effect in more 
detail, as proposed by Stitt [25, 26]. Listening test results in [27] compared the 
differently complex extensions of the energy vector and revealed that the simple 
weighting with w,; and wz; is sufficient for a rough prediction of the perceived 
direction in typical playback scenarios. 

The left side of Fig. 2.9 shows the predicted directions by the energy vector for 
various listening positions when playing back the same signal on a standard stereo 
loudspeaker pair with a radius of 2.5m. The absolute localization error can be cal- 
culated from the difference of the predicted direction and the desired panning direc- 
tion. The right side of Fig. 2.9 depicts areas with localization errors within 4 ranges: 
0°... 10° (white, perfect localization), 10° . . . 30° (light gray, plausible localization), 
30° ... 90° (gray, rough localization), and >90° (dark gray, poor localization). 

Concerning a single playback scenario, i.e. a single panning direction on a 
loudspeaker setup, the perceptual sweet area for plausible playback can be estimated 
by the area with localization errors below 30°. For the prediction of a more general 
sweet area, the absolute localization errors can be computed for all possible panning 
directions in a fine grid of 1° and averaged at each listening position as shown in 
Fig. 2.10. 
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Fig. 2.9 Predictions of perceived directions by the energy vector for different listening positions in 
a standard stereo setup with two loudspeakers playing the same signal. Gray-scale areas on the right 
indicate listening areas with predicted absolute localization errors within different angular ranges 
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2.3 Width 


M. Frank [10] investigated the auditory source width for frontal loudspeaker pairs 
with O dB level difference and various aperture angles, as well as the influence of 
an additional center loudspeaker on the auditory source width. The response was 
given by reading numbers off a left-right symmetric scale written on the loudspeaker 
arrangement (Fig. 2.11). 

Figure 2.11 (right) shows the statistical analysis of the responses. Obviously the 
additional center loudspeaker decreases the auditory source width. 

Auditory source with is difficult to compare for different directions and also single 
loudspeakers yield auditory source widths that vary with direction. Still, a relatively 
constant auditory source width is desirable for moving auditory events. For static 
auditory events, the narrowest-possible extent can be desirable. 
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Fig. 2.11 Experimental setup and results of experiments of M. Frank (confidence intervals) about 
auditory source width of frontal stereo pairs of the angles +5°, . . . , +40° and with an additional 
center loudspeaker (C) 


2.3.1 Model of the Perceived Width 


The angle 2 arccos ||rg|| describes the aperture of a cap cut off the unit sphere per- 
pendicular to the rg vector, at its tip, from the origin, see Fig. 2.12. As the rg vector 
length is between 0 (unclear direction) and 1 (only one loudspeaker active), this angle 
stays between 180° and 0°. 

M. Frank’s experiments about the auditory source width [10, 28] showed that 
stereo pairs of larger half angles œ were also heard as wider. The length of the 
rg vector gets shorter with the half angle a. In a symmetrical loudspeaker pair 
61 = (cosa, + sing) with gj = g2 = 1, the y coordinate of the rg vector cancels 
and its length is 


Tell = rex = cosa. 
The corresponding spherical cap is same size as the loudspeaker pair 2 arccos ||rg|| = 
2a. However, only Š of the size was indicated by the listeners of the experiments, 


which yields the following estimator of the perceived width: 


ASW = 2. 180". 2 arccos irel. (2.11) 


arccos||reg || 


Fig. 2.12 Cap size associated with rg length model for L+R (left plot) and L+R+C (right plot) 
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For an additional center loudspeaker g3 = 1, o= (1, 0), the estimator yields 


1 2 
rell =rex = 5 + > cosg, 


3 3 


an increase matching the experiments as arccos ||rg|| < œ, see Figs. 2.13 and 2.12. 


2.4 Coloration 


Despite research primarily focuses on the spatial fidelity of multi-loudspeaker play- 
back, the overall quality of surround sound playback was found to be largely deter- 
mined by timbral fidelity (70%) [29]. Loudspeakers in a studio or performance space 
are often characterized by different colorations that are caused by different reflection 
patterns (most often the wall behind the loudspeaker). When changing the active 
loudspeakers, or their number, these differences become audible. On the one hand, 
static coloration, e.g. the frequency responses of the loudspeakers, can typically be 
equalized. On the other hand, changes in coloration during the movement of a source 
cannot be equalized easily and yield annoying comb filters. 

Although coloration is often assessed verbally [30], we employ a simple technical 
predictor based on the composite loudness level (CLL) by Ono [31, 32]. The CLL 
spectrum predicts the perceived coloration and is calculated from the sum of the 
loudnesses of both ears in each third-octave band. Studies about loudspeaker and 
headphone equalization show that differences in third-octave band levels of less than 
1dB are inaudible by most listeners [33, 34]. This criterion can also be applied for 
the perception of coloration, i.e., differences between CLL spectra of less than 1dB 
are assumed to be inaudible. 

Pairwise panning between loudspeakers results in a single active loudspeaker for 
source directions that coincide with the direction of a loudspeaker and two equally 
loud loudspeakers for source directions exactly between two neighboring loudspeak- 
ers, cf. Fig.2.14. In the second case, the different propagation paths from the two 
loudspeakers to the ears create a comb filter. This comb filter is not present for sources 
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Fig. 2.14 Coloration predicted by composite loudness levels for a single loudspeaker C (black), 
two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray) 


played from a single loudspeaker. Thus, moving a source between the two directions 
yields noticeable coloration. This is in contrast to static sources, for which Theile’s 
experiments [35] indicated that they are perceived without coloration. 

The actual shape of the afore-mentioned comb filter depends on the angular dis- 
tance between the loudspeakers. The first notch and its depth decreases with the dis- 
tance. This implies that coloration increases for playback with higher loudspeaker 
densities. 

A similar comb filter is created when using a triplet of loudspeakers with the 
same loudspeaker density as the pair, e.g. L, C, R compared to C, R. In order to 
avoid a strong increase in source width or annoying phasing effects, the outmost 
loudspeakers L and R are strongly reduced in their level, typically around -12dB 
compared to loudspeaker C. In doing so, the similarity of the comb filters yields 
barely any coloration when moving a source between the two directions, cf. Fig. 2.15. 

Judging from what is shown above, it appears beneficial to activate always a few 
loudspeaker to stabilize the coloration, as opposed to using just one loudspeaker 
and moving the playback to another one. Keeping the number of simultaneously 
active loudspeakers more or less constant does not only prevent coloration of source 
movements, it also yields a more constant source width. Because of this relation 
between coloration and source width, the fluctuation of ||r g || is also a simple predictor 
of panning-dependent coloration. 

In general, the strongest coloration is perceived under anechoic listening condi- 
tions. In reverberant rooms, the additional comb filters introduced by reflections help 
to conceal the comb filters due to multi-loudspeaker playback. 
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Fig. 2.15 Coloration predicted by composite loudness levels for loudspeaker C with additional — 
12 dB from L and R (black), two equally loud loudspeakers C and R (light gray), and their difference 
(dashed dark gray) 


2.5 Open Listening Experiment Data 


Experimental data from azimuthal localization in frontal and lateral loudspeaker pairs 
Figs. 2.3 and 2.4, azimuthal/elevational localization in horizontal, skew, and vertical 
frontal pairs Fig. 2.5, triangles Fig. 2.7, and quadrilaterals Fig. 2.8 are available online 
at https://opendata.iem.at in the listening experiment data project, as well as the data 
to the width experiment in Fig. 2.11. 

The opendata.iem.at listening experiment data project contains evaluation routines 
to analyze the 95%-confidence intervals symmetrically based on means, standard 
deviations and the inverse Student’s t-distribution CIMEAN . m, or more robustly based 
on median and inter-quartile ranges C12 .m and Student’s t-distribution, or for two- 
dimensional data analysis robust_multivariate_confidence_region.m. The 
MATLAB script plot_gathered_data.mreads the formatted listening experiment 
data and its exemplary code generates figures like the above. 

In order to support others providing own listening experiment data, the 
MATLAB functions write_experimental_data.mread_experimental_data.m 
are provided on the website. 
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Chapter 3 A) 
Amplitude Panning Using Vector Bases get 


The method is straightforward and can be used on many 
occasions succesfully. 


Ville Pulkki [1], Ph.D. Thesis, 2001. 


Abstract This chapter describes Ville Pulkki’s famous vector-base amplitude pan- 
ning (VBAP) as the most robust and generic algorithm of amplitude panning that 
works on nearly any surrounding loudspeaker layout. VBAP activates the smallest- 
possible number of loudspeakers, which gives a directionally robust auditory event 
localization for virtual sound sources, but it can also cause fluctuations in width 
and coloration for moving sources. Multiple-direction amplitude panning (MDAP) 
proposed by Pulkki is a modification that increases the number of activated loud- 
speakers. In this way, more direction-independence is achieved at the cost of an 
increased perceived source width and reduced localization accuracy at off-center 
positions. As vector-base panning methods rely on convex hull triangulation, irreg- 
ular loudspeaker layouts yielding degenerate vector bases can become a problem. 
Imaginary loudspeaker insertion and downmix is shown as robust method improv- 
ing the behavior, in particular for smaller surround-with-height loudspeaker layouts. 
The chapter concludes with some practical examples using free software tools that 
accomplish amplitude panning on vector bases. 


Vector-base amplitude panning (VBAP) was extensively described and investigated 
in [2], alongside with the stabilization of moving sources by adding spread with 
multiple-direction amplitude panning (MDPA) [3]. Since then, VBAP and MDAP 
have been becoming the most common and popular amplitude panning techniques, 
which is particularly robust and can automatically adapt to specific playback layouts. 
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42 3 Amplitude Panning Using Vector Bases 
3.1 Vector-Base Amplitude Panning (VBAP) 


Assuming the ry model to predict the perceived direction, an intended auditory event 
at a panning direction 0, we call it the virtual source, can theoretically be controlled 
by the criterion according to V. Pulkki [2] 


L 
o=% 20. (3.1) 
j=l 


Here, 9; are the direction vectors of the loudspeakers involved and the amplitude 
weights 2; need to be normalized for constant loudness 


81 
ers 
viet 8] 


Moreover, the weights g; should always stay positive to avoid in-head localization 
or other irritating listening experiences. For loudspeaker rings around the horizon, 
always 1 or 2 loudspeakers will be contributing to the auditory event, for loudspeakers 
arranged on a surrounding sphere, always 1 up to 3 loudspeakers will be used, whose 
directions must enclose the direction of the desired auditory event, the virtal source. 
For the directional stability of the auditory event, the angle enclosed between the 
loudspeakers should stay smaller than 90°. 

The system of equations for VBAP [2] uses 3 loudspeaker directions and gains to 
model the panning direction 0 


(3.2) 


0 =[0;, ©, l| | =L- s>ğ=L'9, g= (3.3) 


The selection of the activated loudspeaker triplet is preceded by forming all triplets 
of the convex hull spanned by all the given playback loudspeakers. To find the 
loudspeaker triplet that needs to be activated, the list of all triplets is being searched 
for the one with all-positive weights, g; > 0, g2 > 0, g3 > 0. 

Figure 3.1 shows the localization curve for VBAP between a loudspeaker at 0° 
and 45° for a centrally seated listener and one shifted to the left. The experiment 
is described in [4] and results were gathered by a 1.8m circle of 8 loudspeakers, 
and listeners indicated the perceived direction by naming numbers from a 5° scale 
mounted on the loudspeaker setup. Black whiskers of the results (95% confidence 
intervals and medians) for the centrally seated listener indicate a mismatch between 
slope of the perceived angles with VBAP; the ideal curve is represented by the dashed 
line and the mismatch can be understood by a better match of other exponents y in 
Fig. 2.6. The directional spread is quite narrow. For an off-center left-shifted listening 
position the perceived directions is shown in terms of a 5° histogram (gray bubbles) 
in Fig. 3.1. For this off-center position, it becomes clear that the closest loudspeaker 
dominates localization within a third of the panning directions. Still, the directional 


3.1 Vector-Base Amplitude Panning (VBAP) 43 


right 


panning angle in ° 
nm 
oO 
T 
, 


le 
© 


1 1 L L 
30 25 20 15 10 5 0 -5 -10 -15 -20 -25 -30 -35 -40 -45 


left perceived angle in ° right 


Fig. 3.1 Perceived directions for VBAP between loudspeakers at 0° and 45° from [4]. 95% con- 
fidence intervals and medians (black) are for a centrally seated listener in a circle of 2.5m radius. 
Localization for left-shifted listener (1.25 m) can become bi-modal, so that 5° bubble histogram is 
shown (gray) 


mapping seems to be monotonic with the panning angle, and the perceived direction 
stays within the loudspeaker pair, which is a robust result, at least. 

In Fig. 3.2 we see that responses from [5] in which the panning angle was adjusted 
to match reference loudspeakers set up in steps of 15° on amplitude-panned lateral 
loudspeaker pairs fairly match the reference directions using VBAP. The rg vector 
model (black curve) delivers a better match with only one exception at 105°. This 
motivates VBIP as alternative strategy. 
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Fig. 3.2 VBAP angles on a 60°-spaced horizontal loudspeaker ring starting at 0° (a) or 30° (b), 
perceptually adjusted to match panned pink noise with harmonic-complex acoustic reference in 15° 
steps, from [5]; black curve shows rg model prediction 
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0 90 180 270 360 
Fig. 3.3 The width measure 2 arccos ||rg|| for a virtual source on a horizontal and a vertical tra- 


jectory (45° azimuth) using VBAP on an octahedral arrangement 


Vector-Base Intensity Panning (VBIP). With nearly the same set of equations, but 
improving the perceptual mapping by the squares of the weights, the auditory event 
can be controlled corresponding to the direction of the rg vector 
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This formulation appears more contemporary due to the excellent match of the rg 
model to predict experimental results, as shown earlier. 


Non-smooth VBAP/VBIP width. If one of the loudspeakers is exactly aligned with 
the virtual source for either VBAP or VBIP, e.g. 8; = 0, the resulting gains are 
21.2.3 = (1,0, 0), and therefore only 1 loudspeaker will be activated. For a vir- 
tual source between the 2 loudspeakers, e.g. 0; + 02 « 0, then we obtain g1,2,3 = 
(1, 1, 0)/-/2, and hereby only 2 loudspeakers will be active. This behavior in partic- 
ular yields audible variation in the perceived width and coloration. For virtual source 
movements that cross a common edge of neighboring loudspeaker triplets, there will 
often be unexpectedly intense jumps that are quite pronounced. 

Figure 3.3 illustrates the variation of the perceived width with VBAP on an octa- 
hedral arrangements of loudspeakers in the directions o € {[Æ1, 0, 0], [0, +1, 0], 
[0, 0, +1]}. 


3.2 Multiple-Direction Amplitude Panning (MDAP) 


In order to adjust the rg or ry vector not only directionally but also in length, and 
thus to control the number of active loudspeakers for moving sound objects, Pulkki 
extended VBAP to multiple-direction amplitude panning (MDAP [3]). Hereby not 
only the perceived width but also the coloration can be held constant. 
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Direction spread in MDAP.MDAP employs more than one virtual source distributed 
around the panning direction as a directional spreading strategy. For horizontal loud- 
speaker rings, MDAP can consist of a pair of virtual VBAP sources at the angle 
+g around the panning direction g, + «œ. In a ring of L loudspeakers with uniform 


angular spacing of 360, the angle œ = 90% 18 yields optimally flat width for all 


L 
panning directions, as shown for L = 6 in comparison between MDAP and VBAP 
in Fig. 3.4. Moreover, MDAP seems to equalize the aiming of the rg measure to the 


aiming of the ry measure, which is the one controlled by VBAP and MDAP. 


Listening experiment results. Experiments from [4] in Fig. 3.5 investigate the per- 
ceived width for two possible horizontal loudspeaker ring layouts, both with 45° 
spacings, but one starting at 0° (“0”) the other at 22.5° (“1/2”). Widths of MDAP 
with a direction spread of œ = 22.5° are perceived as significantly similar on both 
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ring layouts, while VBAP yields significantly narrower results for panning onto the 
frontal loudspeaker in the “0” layout, which activates a single loudspeaker, only. 
Note that VBAP1/2 and MDAP1/2 are identical with œ = 22.5° and were treated as 
one condition. 

Moreover, a more constant width measure also describes a more constant number 
of activated loudspeakers while panning. Figure 3.6 shows that listeners can hear 
the difference in coloration changes with rotatory panning using pink noise and a 
constant speed. The figure shows that coloration fluctuations of MDAP are always 
clearly smaller than with VBAP on similar loudspeaker rings. Moreover, coloration 
changes are more pronounced on rings of 16 loudspeakers than with 8 loudspeakers, 
which is explained by their faster fluctuation. 

Figure 3.7 shows the results from [6] for a central and left-shifted off-center lis- 
tening position when using MDAP on an 8-channel ring of loudspeakers. At the 
central listening position, the perceived directional spread around the loudspeaker 
positions 0° and 45°, obviously increases as expected, as indicated by the whiskers 
(95% confidence intervals and medians). Moreover, the spread of MDAP seems to 
slightly decrease the slope mismatch between the underlying VBAP algorithm and 
the perceptual curve around the 22.5° direction. 

Despite MDAP enforces a larger number of active loudspeakers, its localization 
is still similarly robust as the one of VBAP, also at on off-center listening posi- 
tions. The perceived direction can be assumed to stay at least confined within a 
strictly directionally limited activation of loudspeakers. Correspondingly, the per- 
ceived directions shown in the gray 5°-histogram bubbles of Fig.3.7 indicate the 
perceived directions when the listener is located left-shifted off-center. While local- 
ization is slightly attracted by the closer loudspeaker at 0°, the larger spread causes 
a more monotonic outcome that is less split than with VBAP in Fig. 3.1. 

For a more exhaustive study, Frank used 6 loudspeakers on the horizon and gave 
the task to his listeners to align an MDAP pink-noise direction to match acoustical 
references every 15° (harmonic complex) by adjusting the panning direction [5]. 
The results in Fig.3.8 contain 24 answers from 6 subjects responding four times 
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Fig. 3.7 Perceived directions for MDAP panning on an 8-channel 2.5m radius loudspeaker ring 
within the interval [0°, 45°] at a central (black medians and 95%- confidence whiskers) and 1.25 m 
left-shifted off-center listening position (gray 5° bubble histogram); dashed line indicates ideal 
panning curve 
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Fig. 3.8 MDAP pink-noise directions on horizontal rings of 60°-spaced loudspeakers adjusted to 
perceptually match reference loudspeaker directions (harmonic complex) every 15°. Markers and 
whiskers indicate 95% confidence interals and medians, black curve the rg vector model 


(by repetition and symmetrization). The black line shows directions indicated by the 
rg vector model for the tested conditions. Obviously, the confidence intervals of the 
adjusted MDAP angles match quite well both the reference directions and predictions 
by the rg vector model, in particular for angles between 0° and 90° (except 75°) for 
the ring starting at 0°, and from 0° to 120° for the 30°-rotated ring. The mismatch is 
much less than 4° for panning angles < 90°. 
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Fig. 3.9 The width measure 2 arccos ||rg || for virtual sources on a horizontal and vertical path on 
an octahedron setup using MDAP with additional 8 half-amplitude virtual sources at 45° distance 
to the main virtual source 


MDAP with 3D loudspeaker layouts. For more arbitrary 3D loudspeaker arrange- 
ments, multiple-directions could be arranged ring-like, see Fig. 3.9. This arrangement 
uses 8 additional virtual sources inclined by 45° wrt. the main virtual source. 

At least mathematically, however, it requires to post optimize the amplitudes and 
angles of the virtual sources in order to accurately match the desired ry or rg vector in 
direction and length on irregular loudspeaker arrangements, cf. [7]. Non-uniform ry 
vector lengths of the individual virtual sources involved cause a distorted resultant 
vector. In particular, their superposition is distorted towards those of the multiple 
virtual source directions with the longest ry vectors. Epain’s article [7] proposes 
optimization retrieving optimal orientation and weighting of the multiple virtual 
sources for every panning direction. 


3.3 Challenges in 3D Triangulation: Imaginary 
Loudspeaker Insertion and Downmix 


Surrounding loudspeaker hemispheres typically exhibit the following two problems, 
in most cases: 


e Loudspeaker rectangles at the sides of standard setups with height (ITU-R 
BS.2051-0 [8]) can be decomposed ambiguously into triangles at the sides, back, 
and top. This can yield noticeable ranges within loudspeaker quadrilaterals, in 
which auditory events are unexpectedly created by just two of the loudspeakers. 

e Signals of virtual sources below the horizon usually get lost. 


The problem of unfavorable or ambiguous triangulations into loudspeaker triplets 
appears subtle, however, it can cause clearly audible deficiencies. Especially when 
ambiguous triangulation yields asymmetric behavior between left and right, e.g., for 
the top, rear, and lateral directions, where we would manually define loudspeaker 
quadrilaterals instead of triangles, see [9]. 

As surrounding loudspeaker hemispheres are typically open by 180° towards 
below, VBAP/ VBIP/ MDAP is numerically unstable and theoretically useless for 
any panning direction below. Despite the absence of loudspeakers below renders 
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downwards amplitude panning theroretically infeasible, it is still reasonable to pre- 
serve signals of virtual sources that are meant for playback on spherically surrounding 
setups. 

In the case of the asymmetric loudspeaker rectangles, see Fig. 3.10, and a missing 
lower hemisphere of surrounding loudspeakers, the insertion of one or more imagi- 
nary loudspeakers in the vertical direction (nadir) or in the middle of the rectangle 
(the average direction vector) has proven to be a useful strategy, e.g. in [10]. Any 
imaginary loudspeaker aims at either extending the admissible triangulation towards 
open parts of the surround loudspeaker setup, or to cover for parts with potential 
asymmetry, see [9]. 


90 180 270 360 


0 90 180 


Fig. 3.10 VBAP on the ITU D (4 + 5 + 0) setup [8]. Top row: Insertion of imaginary loudspeaker 
at nadir preserves loudness of downward-panned signals, shown for vertical path and E values in 
dB for factors { wo R 0} to re-distribute the signal to the 5 existing horizontal loudspeakers. 
Middle row: Due to typical triangulation, two left-right mirrored vertical paths (45° azimuth) yield 
asymmetric behavior, as shown by the 2 arccos ||rg || measure. Bottom row: Insertion of imaginary 


loudspeaker at 65° fixes symmetry and feeds the signal with a factor 7 to the 4 existing neighbor 
loudspeakers 
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The signal of the imaginary loudspeaker can be dealt with in two ways 


e it can be dismissed, e.g., for loudspeaker below at nadir, this would still yield a 
signal near the closest horizontal pair of loudspeakers for virtual sources panned 
to below-horizontal directions unless panned exactly to nadir 

e it can be down-mixed to the neighboring M loudspeakers by a factor of we or 
less as in Fig. 3.10; alternatively for control yielding perfectly flat Æ measures, the 
resulting down-mixed gain vector can be re-normalized by Eq. (3.2). 


3.4 Practical Free-Software Examples 


3.4.1 VBAP/MDAP Object for Pd 


There is a classic VBAP/MDAP implementation by Ville Pulkki that is available 
as external in pure data (Pd). The example in Fig. 3.11 illustrates its use together with 
some other useful externals in Pd. Software requirements are: 


loadbang 
define_loudspeakers 3 0 0 90 0 180 0 -90 0 0 90 0 -90 ( octahedron layout 


azimuth 


panning angles and spread 


<-VBAP/MDAP gains 


master 
leve 


multiline~ 0 0 0 000 10 


d 


dac~ 123456 lspks 


dB dB -0dB dB 0dB dB 
2 2 2 2 2 2 
6 6 6 6 -6 6 
-12 -12 -12 -12 -12 -12 
-20 -20 -20 -20 -20 -20 
-30 -30 -30 -30 -30 -30 
-50 -50 -50 -50 -50 -50 
<-99 <-99 <-99 <-99 <-99 <-99 

Front Left Rear Right Top Bottom 


Fig. 3.11 Vector-Base/Multi-Direction Amplitude Panning (VBAP/MDAP) example in pure data 
(Pd) using Pulkki’s [vbap] external for an octahedral layout 
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e pure-data (free, http://puredata.info/downloads/pure-data) 
e iemmatrix (free download within pure-data) 

e zexy (free download within pure-data) 

e vbap (free download within pure-data). 


3.4.2 SPARTA Panner Plugin 


The SPARTA Panner under http://research.spa.aalto.fi/projects/sparta_vsts/plugins.html pro- 
vides a vector-base amplitude panning interface (VBAP) and multiple-direction 
amplitude panning (MDAP), see Fig. 3.12, with frequency-dependent loudness nor- 


malization by ,/ Ei gf adjustable to the listening conditions, see Laitinen [11]. 

The parameter DTT can be varied between 0 (standard, frequency-independent 
VBAP normalization, i.e. diffuse-field normalization), 0.5 for typical listening envi- 
ronments, and | for the anechoic chamber. The plugin allows to either manually 
enter the azimuth and elevation angles of multiple panning directions (if more than 
one input signal is used) and for the playback loudspeakers, or import/export from/to 
preset files. Of course all panning directions can be time-varying and be moved per 
mouse, automations, or controls. 


Inputs Panning Window Outputs 


Presets: 9.x bad Presets: 


N Chan: 9 N Chan: 


Azi : Azi 
30.000 0.000 r á 45.000 
-45.000 
0.000 
135.000 
0.000 


45.000 b 0.50 Show Inputs: |v] Show Outputs: v 


Fig. 3.12 The Panner VST plug-in from Aalto University’s SPARTA plug-in suite manages 
Vector-Base Amplitude Panning within sequencers supporting VST 
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Chapter 4 ®) 
Ambisonic Amplitude Panning ly 
and Decoding in Higher Orders 


...the second-order Ambisonic system offers improved imaging 
over a wider area than the first-order system and is suitable for 
larger rooms. 


Jeffrey S. Bamford [1], Canadian Acoustics, 1994. 


Abstract Already in the 1970s, the idea of using continuous harmonic functions of 
scalable resolution was described by Cooper and then Gerzon, who introduced the 
name Ambisonics. This chapter starts by reviewing properties of first-order horizontal 
Ambisonics, using an interpretation in terms of panning functions. And the required 
mathematical formulations for 3D higher-order Ambisonics are developed here, with 
the idea to improve the directional resolution. Based on this formalism, ideal loud- 
speaker layouts can be defined for constant loudness, localization, and width, accord- 
ing to the previous models. The chapter discusses how Ambisonics can be decoded 
to less ideal, typical loudspeaker setups for studios, concerts, sound-reinforcement 
systems, and to headphones. The behavior is analyzed by a rich variety of listen- 
ing experiments and for various decoding applications. The chapter concludes with 
example applications using free software tools. 


Cooper [2] used higher-order angular harmonics to formulate circular panning of 
auditory events. Due to the work of Felgett [3], Gerzon [4], and Craven [5], the term 
Ambisonics became common for technology using spherical harmonic functions. 
Around the early 2000s, most notably Bamford [6], Malham [7], Poletti [8], Jot [9], 
and Daniel [10] pioneered the development of higher-order Ambisonic panning and 
decoding, Ward and Abhayapala [11], Dickens [12], and at the lab of the authors 
Sontacchi [13]. 

Another leap happened around 2010, when Ambisonic decoding to loudspeakers 
could be largely improved by considering regularization methods [14], singular- 
value decomposition [15], and all-round Ambisonic decoding (AIRAD) [15, 16], 


© The Author(s) 2019 53 
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, 
https://doi.org/10.1007/978-3-030-17207-7_4 


54 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 


a combination of vector-base panning techniques with Ambisonics, yielding the most 
robust and flexible higher-order decoding method known today. 

For headphones, after the work of Jot [9] that outlined the basic problems 
of binaural decoding in the 1990s, Sun, Bernschiitz, Ben-Hur, and Brinkmann 
[17-19] made important contributions to binaural decoding, and we consider TAC 
and MagLS decoders by Zaunschirm and Schorkhuber [20, 21] as the essential binau- 
ral decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies 
to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage 
to play back diffuse fields consistently, using the formalism of Vilkamo et al [22]. 


4.1 Direction Spread in First-Order 2D Ambisonics 


In 2D first-order Ambisonics as discussed in Chap. 1, the directional mapping of a 
single sound source from the angle ¢, to the direction of each loudspeaker 
is described by the shape of panning function (or direction-spread function) in 
Eq. (1.17). The directional spreading is not infinitely narrow, but determined by what 
can be represented by first-order directivity patterns. Consequently, sound from the 
angle g, will be mapped by a dipole pattern aligned with the source and an addi- 
tional omnidirectional pattern. We can involve a spread parameter a to make the direc- 
tional spread to the loudspeakers system adjustable and either cardioid-shaped a = 1, 
2D-supercardioid-shaped a = V2, or 2D-hypercardioid-shaped a = 2, using: 


g(y) = 1 +a cos(y — gs). (4.1) 


This function represents how first-order Ambisonic panning would distribute a mono 
signal to loudspeakers. With the loudspeaker positions described by the set of angles 
{@}, a vector of amplitude-panning gains with an entry for each loudspeaker could 
be determined by sampling the direction-spread function: 


81 cos(91 — Qs) 
g=|:|=1l+a : : (4.2) 
8L cos(9L — gs) 


With these gain values, we evaluate models of perceived loudness, direction, and 
width, as introduced in Chap. 2, in order to enter a discussion of perceptual goals. 

If the loudspeaker directions {0;} are chosen suitably, it is possible to obtain 
panning-independent loudness, direction, and width measures E = )°, ae rg = 
t 2i 79), and 3 180" 2 arccos ||rg||. How is it done? 

For first-order 2D Ambisonics, it is theoretically optimal to use at least a ring of 
4 loudspeakers with uniform angular spacing and a = 2, which is easily checked 
with the aid of a computer, cf. Fig.4.1, and explained below and in Sect. 4.4. 


Direction spread in FOA. The panning-function interpretation with its directional 
spread has some similarity to MDAP, with its attempt to directionally spread an 
amplitude-panned signal. Similar to the discrete virtual spread by ta = arccos ||rg|| 
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Fig. 4.1 Width-related angle g “| 
arccos ||rg|| and angular > EET 
o cardioid a=0. 
error Lrg — Qs for first-order c 50 oie peer d. a=1.4414 
Ambisonics on a ring with 49 2D-hypercard. a=2 
90° spaced loudspeakers 3 
i ioi © 30 
using a cardioid, or 2D go 90 180 270 360 
super-/hyper-cardioid panning angle in degree 
patterns; the measures are all 8 3 
panning-invariant and the P is 
super-cardioid weighting is = cardioid a=0.5 
the 2D max-rg weighting Saa O eer amos Dapatan n 
. e ercard. a=. 
with arccos ||rg || = 45° Vy id yR 
NX 


0 90 180 270 360 
panning angle in degree 


around the panning direction. The virtual direction spread of first-order Ambisonics 
is described by its continuous panning function g(@g) in Eq. (4.1). To inspect the 
continuous function by the rg measure defined in Eq. (2.7), we may evaluate an 
integral over the panning function instead of the sum. Because of the symmetry 
around ps, we may set for convenience p, = 0, which knowingly causes rg, y = 0, 
and evaluate 


i g’ (p) cospdy f [1 + 2acosy + a? cos? g] cosy dy a 
FẸ, = = = x 
= Je g? (o) dy Jo U + 2a cos g + a? cos? o] dy 1+ 7 
(4.3) 
The maximum of rg, = sion is found by 4 TEx = Arta p = 0, hence at a = 


./2. Consequently, the 2D max-rg weight is rg,x = a = 5 and yields the angle 
arccos ||rg|| = 45°. This would resemble a 2D-MDAP-equivalent source spread to 
+45°. Note that first-order Ambisonics cannot map to a smaller spread than this. 
Only higher orders permit to further reduce this spread to a desired angle below 90°. 


Ideal loudspeaker layouts. Not only is the directional aiming of the virtual, contin- 
uous first-order Ambisonic panning function ideal and its width panning-invariant, 
also its loudness measure is panning-invariant. However, decoding to a physical 
loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are 
these properties preserved by sampling decoding? 

The 2D first-order Ambisonic components (W, X, Y) correspond to {1, cos ø, 
sin g} patterns, a first-order Fourier series in the angle. Sampling the playback direc- 
tions by L = 3 uniformly spaced loudspeakers on the horizon, the sampling theorem 
for this series is already fulfilled. Accordingly, Parseval’s theorem ensures panning- 
invariant loudness EF for any panning direction. 

For an ideal rg measure, however, one more loudspeaker is required L > 4 
for a uniformly spaced horizontal ring. To explain this increase exhaustively, the 
concept of circular/spherical polynomials and t-designs will be introduced in this 
chapter. For a brief explanation, g*(~) is a second-order expression and therefore to 
represent the ideal constant loudness E = f g°(~) dg of the continuous panning 
function consistently after discretization E = on a 8, it requires L = 3 uniformly 
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spaced loudspeakers, as argued before. By contrast, the expressions g7(g) cos g 
and g*(y) sing are third-order and appear in rg: E = f g°(v) [cos o, sing]'dg. 
Consequently, ideal mapping of rg (direction and width) requires at least one more 
loudspeaker L = 4 for a uniformly spaced arrangement to make the continuous and 
the discretized form rg E = on ey 8g? [cos ¢;, sin g]' perfectly equal. 


Towards a higher-order panning function. An Nth-order cardioid pattern is obtained 
from the cardioid pattern by taking its Nth power 


1 N 
gno) = onl + cos)”, 


which makes it narrower. With N = 2, this becomes, using cos? g = ia + cos 29), 


1 1 
g2(g) = au +2cosg+ cos? ğ)= 3° + 4cos o + cos 2g). 


More generally, Chebyshev polynomials T,,(cos g) = cos mọ, cf. [23, Eq.3.11.6] 
can be used to argue that there is always a fully equivalent cosine series describing 
the higher-order 2D panning function in the azimuth angle 


N 
gg) = > am COS. (4.4) 
m=0 


Rotated panning function. In first-order Ambisonics, panning functions consist of 
an omnidirectional part, cos(0g) = 1, and a figure-of-eight to x, cos g, but that was 
not all: Recording and playback also required a figure-of-eight pattern to y, sin @. 
The additional component allows to express rotated first-order directivities by a basis 
set of fixed directivities. For higher orders, a panning function rotated to a non-zero 
aiming ps Æ 0 
N 
8(9 — ps) = Dam coslm(y — 9)] (4.5) 


m=0 


can be re-expressed by the addition theorem cos(@ + 8) = cosa cos f — sina sin £ 
into a series involving the sinusoids (odd symmetric part of a Fourier series), 


N 
g(~9-—Ys) = > Am [COs mo; cos mo + sin mg, sin mọ] (4.6) 


m=0 
N N 

= y a® cosmo + ) a‘) sinmg. 
n=0 m=0 


We conclude: Higher-order Ambisonics in 2D (and the associated set of theoretical 
microphone directivities) is based on the Fourier series in the azimuth angle @. 
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4.2 Higher-Order Polynomials and Harmonics 


The previous section required that direction and length of the rg vector resulting from 
amplitude panning on loudspeakers matched the desired auditory event direction and 
width. Harmonic functions with strict symmetry around a panning direction 6, will 
help us in achieving this goal and in defining good sampling. 

Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and 
resolution-limited axisymmetric functions around the panning direction 9, to fulfill 
our perceptual goals of a panning-invariant loudness E, width ||rg||, and perfect 
alignment between panning direction 0, and localized direction rg. Then we hope 
to find suitable directional discretization schemes for ideal loudspeaker layouts, so 
that the measures E and rg are perfectly reconstructed in playback. 

The projection of a variable direction vector 8 onto the panning direction 0, 
always yields the cosine of the enclosed angle oro = cos ġ, no matter whether it is 
in two or three dimensions. Hereby constructing the panning function based on this 
projection readily meets the desired goals. The mth power thereof, (0.0) = cos” 
helps to build an Nth-order power series g = yh am (0.0) to describe a virtual 
Ambisonic panning function. 

For 2D, such a circular polynomial g = YN _o am(010)” contains all (N + 
1)(N + 2)/2 mixed powers by (0/0) = (Osx + OysOy)” = peo ("7) OxsOx)* 
(OysOy)™* of the direction vectors’ entries 0. = [Oxs, Oys] and 0 = [6x, 81". 
However, we could already recognize that it only takes 2N + 1 functions to express 
p= Ag am(010)” = SS am cos” @: First an initial polynomial with relative 
azimuth @ = ọ — g, relating to a harmonic series of N + 1 cosines or Chebyshev- 
polynomials g = ee bm cosmo = ye bm Tm (8,0). Then, in terms of absolute 
azimuth o, the trigonometric addition theorem re-expresses the series into one of 
N + 1 cosines and N sines, with T,, (0,0) = cos[m(¢ — g,)] = cos mg; cos mo + 
sin m@, sin mọ. As shown in the upcoming section, we can alternatively obtain such 
orthonormal harmonic functions by solving a second-order differential equation that 
is generally used to define harmonics, which bears the later benefit that we can use 
the approach to define spherical harmonics in three space dimensions. 

Spherical polynomials are similar, g = ya An (0.0)", involving the 
expressions (070)” = (0x39 + OysOy + P7502)" = pg = (x) A) (0,50,)* 
(0x) Gy), Again, all these (N + 1)(N + 2)(N + 3)/6 combinations would 
be too many to form an orthogonal set of basis functions. Moreover, while the differ- 
ent cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in 
3D. On the sphere, the N + 1 orthogonal Legendre polynomials P,,(cos ġ) replace 
the cosine series as a basis for g = ar Cn P,(cos @), as shown below. All math- 
ematical derivations for the sphere rely on the definition of harmonics. They result 
in (N+ 1)? spherical harmonics and their addition theorem as a basis in terms of 
absolute directions aH p, (070) = $”, Y” (0,)Y” (0). Dickins’ thesis is inter- 
esting for further reading [12]. 

In both regimes, 2D and 3D, the circular or spherical polynomials concept will be 
used to determine optimal layouts, so-called t-designs. Such f-designs are directional 
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sampling grids that are able to keep the information about the constant part of any 
either circular (2D) or spherical (3D) polynomials up to the order N < t. This will 
be a mathematical key property exploited to determine requirements for preserving 
E and rg measures during Ambisonic playback with optimal loudspeaker setups, 
but not only. Also t-designs simplify numerical integration of circular or spherical 
harmonics to define state-of-the-art Ambisonic decoders or mapping effects. 


4.3 Angular/Directional Harmonics in 2D and 3D 


The Laplacian is defined in the D-dimensional Cartesian space as 


Da 
=> L, (4.7) 
j=l xj 


and for any function f, the Laplacian A f describes the curvature. Any harmonic 
function is proportional to its curvature by an eigenvalue i, 


Af=-aAf, (4.8) 


and therefore is an oscillatory function. Generally, eigensolutions A f = —A f to 
the Laplacian are called harmonics. For suitable eigenvalues à, harmonics span an 
orthogonal set of basis functions that are typically used for Fourier expansion on a 
finite interval. It seems desirable to find such harmonics for functions only exhibiting 
directional dependencies, i.e. in the azimuth angle p in 2D, and azimuth and zenith 
angle g, v in 3D. 


4.4 Panning with Circular Harmonics in 2D 


For 2 dimensions Appendix A.3.2 uses the eae chain rule to convert the 


Laplacian of a 2D coordinate system A = so + d toa a aonne system 


with the radius r and the angle ọ to the x axis, A = ie + — + + ae 
functions ® = ®(q@) purely in the angle ¢, the radial derivatives of ‘A® all vanish 
and it remains (0 —> d) 


d2 
ee =—ìr ð. (4.9) 
p 


It is only yielding useful solutions with àr? = m?, m € Z, cf. Appendix A.3.4, 
Fig. 4.2, 
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\ ap2ts >. 


Fig. 4.2 Circular harmonics with m = —3, ...,3 plotted as polar diagram using the radius R = 
201g |./7 ®,,| and grayscale to distinguish between positive (gray) and negative (black) signs 


i V2sin(|m|o), form <0, 
On = — 11, for m = 0, (4.10) 
on J/2cos(mg), form > 0, 


which defines how to decompose panning functions of limited order |m| < N. The 
harmonics are periodic in azimuth, orthogonal and normalized (orthonormal) on the 
period —z < g < m. Dueto their completeness, any square-integrable function g(¢) 
can be expanded into a series of the harmonics using coefficients Ym 


8() = D> Ym Pn(Q). (4.11) 


m=— 00 


For a known function g(ọ), the coefficients ym are obtained by the transformation 
integral 


Ym =f 8p) Pm (p) do, (4.12) 


T 


as shown in Appendix Eq. (A.14). 


2D panning function. An infinitely narrow angular range around a desired direction 
|p — gs| < € — 0 is represented by the transformation integral over a Dirac delta 
distribution 6(g — s), cf. Appendix Eq. (A.16), so that the coefficients of such a 
panning function are 


Ym = Pm (Ys). (4.13) 


As the infinite circular harmonic series is complete, the panning function is 
oe) 
(9) = XO Pno) Pno) = 8l — Ys), (4.14) 
m=—OOo 


and in practice we resolution-limit it to the Nth Ambisonic order, |m| < N, and use 
an additional weight a,, that allows us to design its side lobes 
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Fig. 4.3 2D unweighted a, = 1 basic and weighted max-rg Ambisonic panning functions for the 
orders N = 1, 2,5 
N 
NP) = X dm Pn (Gs) Pnl). (4.15) 


m=—N 


The max-rg panning function [24] uses the weights am = cos(sNh Dhi as derived in 


Appendix Eq. (A.20). The spread is now adjustable by the order to + bo o. The result 
is shown in Fig. 4.3, compared with no side-lobe suppression when a, = 1 (basic). 

It is easy to recognize: ®, (ps) represents the recorded or encoded directions, and 
®,,(g) represents the decoded playback directions. 


Optimal sampling of the 2D panning function. In the theory of circular/spherical 
polynomials in the variable ¢ = cos(g — 9s), so-called t-designs in 2D are optimal 
point sets of given angles {g,} with / = 1,...,L and size L. A t-design allows to 
perfectly compute the integral (constant part) over the polynomials F,, (¢) of limited 
degree m < t by discrete summation 


£ L 
f Pn(e056) 46 =X Pmleos(or = e91 £, (4.16) 


l=1 


regardless of any angular shift g,. In 2D, Chebyshev polynomials T,,, (cos @) = 
cos(m@) are orthogonal polynomials, therefore an Nth-order panning function com- 
posed out of cos(m@) is always a polynomial of Nth degree. Knowing this, it is clear 
that the integral over gz, required to evaluate the loudness measure E is a polynomial 
of the order 2N. The integral to calculate rg is over g4 cos(@) and thus of the order 
2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the 
continuous panning function and also the perfectly oriented rg vector of constant 
spread arccos ||rg ||, the parameter t must be t > 2N + 1. In 2D, all regular polygons 
are t-designs with L = t + 1 points 


Q = a (@—1). (4.17) 


We can use the smallest set of 2N + 2 angles ọ; = && (I — 1) as optimal 2D layout. 


NT 
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4.5 Ambisonics Encoding and Optimal Decoding in 2D 


To encode a signal s into Ambisonic signals x,,, we multiply the signal with the 
encoder representing the direction of the signal at the angle g; by the weights ®,,, (9s) 


Xm(t) = On (Gs) S(t), (4.18) 


or in vector notation 


Xn = Ws) 5; (4.19) 


using the column vector yy = [P-n (Ys), ..., ®y(g;)]' of 2N + 1 components. 
The Ambisonic signals in xy are weighted by side-lobe suppressing weights ay = 
[aj_nj, ---5 an], expressed by the multiplication with a diagonal matrix diag{ay}, 
and then decoded to the L loudspeaker signals x by a sampling decoder 


TT T TT 
D = Z [ynl «--s IN] =Z YE, (4.20) 
using 
x = Ddiag{ay} xn. (4.21) 


In total, the system for encoding and decoding can also be written to yield a set of 
loudspeaker gains for one virtual source 


g = D diag{an} yy (Ys), (4.22) 


or in particular for the 2D sampling decoder g = ,/ = Y N diag{an} yx (Ys). 


4.6 Listening Experiments on 2D Ambisonics 


There are several listening experiments discussing the features of Ambisonics, most 
of which are summarized in [25], which will be discussed complemented with those 
from [26] below. 

The perceptually adjusted panning angle of 2nd-order max-rg Ambisonics 
panning on 6 horizontal loudspeakers matches quite well the acoustic reference 
direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more 
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Fig. 4.4 2nd-order max-rg-weighted Ambisonic panning with pink-noise on horizontal rings of 
60°-spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic 
complex) every 15°. Markers and whiskers indicate 95% confidence intervals and medians, black 
curve the rg vector model 
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Fig. 4.5 In Frank’s 2008 pointing experiment [27] on center and off-center listening seats for 3 
virtual sources (A, B, C) using Ist-order (left) and 5th-order (right) Ambisonics on 12 horizontal 
loudspeakers (IEM CUBE) indicate a more stable localization with high orders. Moreover, for 
Sth-order, max-rg weighting and omission of delay compensation were preferred. Omission of 
max-rg weights (“basic”) or alternative “in-phase” weights that entirely suppresses any side lobe 
yield less precise localization at off-center listening positions 
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(b) Frank 2013 off-center experiment - basic (no sidelobe suppression) 
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(c) Stitt 2014 off-center experiments 


Fig. 4.6 Experiments on an off-center position in a show that max-rg outperforms the basic, 
rectangularly truncated Fourier series at off-center listening positions, b where it can avoid splitting 
of the auditory event. Stitt’s experiments ¢ imply that localization with higher orders is more stable 
and that the localization deficiency at off-center listening seats seems to be proportional to the 
ratio between distance to the center divided by radius of the loudspeaker ring, and not the specific 
time-delays that are larger for large loudspeaker rings, cf. [28] 


accurate median by 0.5° on average, and in particular at side and back panning 
directions. 

Another aspect to investigate is how stable the results are for center and off- 
center listening seats as shown in Fig. 4.5. It illustrates that max-rg with the highest 
order achieves the best stability with regard to localization at off-center listening 
seats. Astonishingly, the delay compensation for non-uniform delay times to the 
center deteriorated the results, most probably because of the nearly linear frontal 
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Fig. 4.7 Predicted sweet area sizes using the rg model Sect. 2.2.9 for loudspeaker layouts and 
playback orders used in Stitt’s experiments [28]: first order (top), third order (bottom), small (left), 
and large (right) 


arrangement of loudspeakers that is more robust to lateral shifts of the listening 
positions than a circular arrangement. 

Figure 4.6a, b shows the direction histogram for two different weightings am, 
and it illustrates that proper sidelobe suppression of the panning function by using 
max-rg weights is decisive at shifted listening positions to avoid splitting of the 
auditory image, as it appears in Fig. 4.6b without the weights (basic). 

Peter Stitt’s work shows that the localization offsets at off-center listening seats do 
not increase with the radius of the loudspeaker arrangement as long as the off-center 
seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet 
area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row) 
in Fig. 4.7, with both sizes small setup (left) and large setup (right). 
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Fig. 4.8 The perceptual sweet spot size as investigated by Frank [29] is nearly covering the entire 
area enclosed by the IEM CUBE as a playback setup (black = 5th, gray = 3rd, light gray = Ist 
order Ambisonics). It is smallest for 1st-order Ambisonics 


Fig. 4.9 Frank’s 2013 
experiments showed for 
max-rg Ambisonics on three Tif ® 
frontal loudspeakers that 

0.57 t 


wide 


large perceived widths are 
not entirely accurately 
modeled by the 
optimally-sampled, therefore 
constant rg vector for low 
orders, however reasonably 
well, and significantly for 
high orders/narrow width 


relative source width 


ff. 


on Isp. quarter half onlsp. quarter half 
8 loudspeakers/3rd order 16 loudspeakers/7th order 


narrow 


Frank’s 2016 experiments [29] used scales on the floor from which listeners read 
off where the sweet area ends in every radial direction, cf. Fig.4.8a. For Fig. 4.8b, 
the criterion for listeners to indicate leaving the sweet area was when the frontally 
panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that 
a sweet area providing perceptually plausible playback measures at least Z of the 
radius of the loudspeaker setup if the order is high enough. i 

The perceived width of auditory events is investigated in the experimental results 
of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations 
of the loudspeaker ring (with one loudspeaker in front, with front direction lying 
quarter- and half-spaced wrt. loudspeaker spacing). Listeners compared the width 
of multiple stimuli, and the results were expected to indicate constant width for the 
differently rotated loudspeaker ring, as the optimal arrangement with L = 2N + 2 
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Fig. 4.10 Frank’s 2013 
experiments on the variation 
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provides constant rg length. The panning-invariant length is not perfectly reflected in 
the perceived widths with 3rd order on 8 loudspeakers, for which the on-loudspeaker 
position is perceived as being significantly wider. By contrast, the high-order exper- 
iment with 7th order on 16 loudspeakers would perfectly validate the model. 

Figure 4.10 shows experiments investigating the time-variant change in sound 
coloration for a pink-noise virtual source rotating at a speed of 100°/s, and for 
different Ambisonic panning setups. There is an obvious advantage of a reduced 
fluctuation in coloration at both listening positions, centered and off-center, when 
using the side-lobe-suppressing “max-rg” weighting instead of the “basic” rectan- 
gular truncation of the Fourier series. At the off-center listening position, max-rg 
weights achieve good results with regard to constant coloration for both 3rd and 7th 
order arrangements with 8 and 16 loudspeakers that were investigated. 


How well would diffuse signals be preserved played back? All the above experiments 
deal with how non-diffuse signals are presented. To complement what is shown in 
Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its 
ability to preserve diffuse fields is estimated here by the covariance between uncorre- 
lated directions. Assume a max-rg-weighted Nth-order Ambisonic panning function 
g(0.0) that is normalized to g(1) = 1, encodes two sounds s1,2 from two directions 
0, and 62, with the sounds being uncorrelated and unit-variance E{s1s2} = 6102. 
We can find that the Ambisonic representation mixes the sounds at their respective 
mapped directions and yields an increase of their correlation x; = sı + gi2 52 and 
X2 = $2 + 812 S1, using 12 = g(cos@), 


Ri ma) = a (4.23) 
y Eix }E{x3} 
E{(1 + gh) s1 s2 + gols? + 85)} _ 2812 


— = pa 
JEt? + 2 812 S1 S2 + 27585} E {s2 +2 9125) 52+ gist} 1+ 81 
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This result was presented in Fig. 1.21 and was used to argue that the directional 
separation of first-order Ambisonics by its high crosstalk term gı2 might be too 
weak. Higher-order Ambisonics decreases this directional crosstalk and therefore 
improves the representation of diffuse sound fields. 


4.7 Panning with Spherical Harmonics in 3D 


In three space dimensions, the spherical coordinate system has a radius r and two 
angles, azimuth g indicating the polar angle of the orthogonal projection to the xy 
plane, and the zenith angle Ŷ indicating the angle to the z axis, according to the right- 
handed spherical coordinate systems in ISO31-11, ISO80000-2, [30, 31], Fig. 4.11. 

By the generalized chain rule, Appendix A.3 re-writes the Laplacian to spher- 
ical coordinates in 3D with r signifying the radius, g the azimuth angle, and 
the zenith angle % re-expressed as ¢ = * = cos Ŷ, yielding the operator A = 


24 a? 1 a? 2,4 =é 0 ; i 
fe tar t pama Rta Tt az: Any radius-dependent part is removed 


to define an eigenproblem yielding the basis for panning functions, taking only 
2 
r° Agx3D, 


1 æ a n 8? 
| aa e+ dir= LY (4.24) 


whose solution with à = n(n + 1) defines the spherical harmonics 

Ya O) = Y; (p, 0) = Op (O) Pn (p). (4.25) 
The pre-requisites are (i) periodicity in g and (ii) that the function Y?” is finite on 
the sphere. In addition to the circular harmonics ®,, expressing the dependency on 


azimuth @ according to Eq. (4.10), the spherical harmonics contain the associated 
Legendre functions P?” and their normalization term 


O” (i) = nim P”! (cos v) (4.26) 


1 


Fig. 4.11 The spherical 
coordinate system 
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to express the dependency on the zenith angle }. The index n > 0 expresses the order 
and the directional resolution can be limited by requiring 0 < n < N. The index m 
is the degree and for each n it is limited by —n < 0 < n. 

The spherical harmonics, Fig.4.12, are orthonormal on the sphere =r < g < 
x and 0 < V < x, and for unbounded order N — oo they are complete; see also 
Appendix A.3.7. 

The spherical harmonics permit a series representation of square-integrable 3D 
directional functions by the coefficients Yj, 


gO) =>, >) ran YO). (4.27) 


n=0 m=—n 


From a known function g(@), the coefficients are obtained by the transformation 
integral over the unit sphere S?, cf. appendix Eq. (A.38) 


Yam = I g(0) Y;” (0) dé. (4.28) 
s2 


Note that the above N3D normalization fy es Yp" (0)|? d0 = 1 defines each 
spherical harmonic except for an arbitrary-phase it might be multiplied with. 
Legendre functions for the zenith dependency might be defined differently in lit- 
erature, and for azimuth, some implementations use sin(m@) instead of sin(|m|@). 
A 
preferred, and positive signs of the first-order dipole components in the directions of 


the respective coordinate axes, x, y, z, are preferred. This might require to involve 


In Ambisonics, real-valued functions and the SN3D normalization are 
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Fig. 4.12 Spherical harmonics indexed by Ambisonic channel number ACN = n? + n + m; rows 
show spherical harmonics for the order 0 < n < 3 with the 2n + | harmonics of the degree —n < 
m < n. What is plotted is a polar diagram with the radius R = 201g |Y,”| normalized to the upper 
30 dB of each pattern, with positive (gray) and negative (black) color indicating the sign. The order 
n counts the circular zero crossings, and |m| counts those running through zenith and nadir 


eee 
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the Condon-Shortley phase (—1)"” to correct the signs of the Legendre functions, or 
—1 for m < Q to correct the sign of azimuthal sinusoids, depending on the imple- 
mentation of the respective functions. It is often helpful to employ converters and 
directional checks to ensure compatibility! 


3D panning function. An infinitely narrow direction range around a desired direction 
6:0 > cosé€ —> | is represented by the transformation integral over the Dirac delta 
61 — 070), cf. Eq.(A.41), so that the coefficients of the panning function are 


Yan = Yy; (0s). (4.29) 


As infinitely many spherical harmonics are complete, the panning function is 


g(0) = >> YO OSY O) = 8A — 030), (4.30) 


n=0 m=—n 


and in practice, the finite-resolution Nth-order panning function withn < N employs 
a weight a, to reduce side lobes and optimize the spread 


N n 
gn) = D> Yo an Yp Os) Y7 O). (4.31) 


n=0 m=—n 


137.9° . 
NEESI i. as derived 
137.9° 


in Appendix Eq. (A.46). The spread is now adjustable by the order to + \ 7757. 
Figure 4.13 shows a comparison to the basic weighting a, = 1. An alternative expres- 
sion that uses Legendre polynomials P,, and only depends on the angle ¢ to the pan- 
ning direction @, is obtained by replacing the sum over m by the spherical harmonics 
addition theorem $` YO) Y7 (0) = “uel P,, (cos œ), 


n 
m=—n n 


The max-rg panning function uses the weights a, = P, [cos( 


N 


en(o) = X` ZH a, P, (cos 6). (4.32) 
n=0 


Comparison to first-order Ambisonics shows: now Y,/”"(@,) represents the recorded 
or encoded directions, and Y/"(@) represents the decoded playback directions. 


Optimal sampling of the 3D panning function. In the theory of spherical poly- 
nomials in the variable ¢ = 0:0, so-called t-designs describe point sets of given 
directions {0;} with / = 1,..., L and size L that allow to perfectly compute the inte- 
gral (constant part) over the polynomials P„(¢) of limited order n < t by discrete 
summation 


m 1 L 
f a | Pawar =o P.re E, (4.33) 
T —1 l=1 
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Fig. 4.13 3D unweighted a, = 1 basic and weighted max-rg Ambisonic panning functions for the 
orders N = 1, 2,5 


relative to any axis ð, the point set is projected onto. In 3D, the Legendre polyno- 
mials P,,(¢) are orthogonal polynomials, therefore an Nth-order panning function 
composed thereof is a polynomial of Nth order. The loudness measure E is cal- 
culated by the integral over an therefore over a polynomial of the order 2N. The 
integral to calculate rg runs over g4 ¢, therefore over a polynomial of the order 
2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the 
continuous panning function and also the perfectly oriented rg vector of constant 
spread arccos ||rg||, the parameter t must be t > 2N + 1. In 3D there are only 5 
geometrically regular layouts 


e the tetrahedron, L = 4 corners, is a 2-design, 

the octahedron, L = 6 corners, is a 3-design, 

the hexahedron (cube), L = 8 corners, is a 3-design, 
the icosahedron, L = 12 corners, is a 5-design, 

the dodecahedron, L = 20 corners, is a 5-design. 


For instance, for N = 1, the octahedron is a suitable spherical design, for N = 2, the 
icosahedral or dodecahedral layouts are suitable. 

Exceeding the geometrically regular layouts, there are designs found by optimiza- 
tion to be regular under the mathematical rule to approximate j. ge Y, (0) dd = VAT ôn 
accurately by t X Y” (0) for alln < t and |m| < n. A large collection can be found 
by Hardin and Sloane [32], Gräf and Potts [33], and Womersley [34] available on 
the following websites 
http://neilsloane.com/sphdesigns/dim3/ 
http://homepage.univie.ac.at/manuel.graef/quadrature.php 
(Chebyshev-type Quadratures on S°), and 
https://web.maths.unsw.edu.au/~rsw/Sphere/EffSphDes/ss.html. 

Figure 4.14 gives some graphical examples. 
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Fig. 4.14 t-designs from Gräf’s website (Chebyshev-type quadrature) 
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To encode a signal s into Ambisonic signals Xm, we multiply the signal with the 
encoder representing the direction 0, of the signal by the weights Y7” (0,) 


Xnm (t) = y" CA) s(t), (4.34) 

or in vector notation 
Xn = Yn(Os) 8, (4.35) 
using the column vector yy = [Y?.), ¥,@.), bay YN 0)" of (N + 1)? com- 
ponents. The Ambisonic signals in x y are weighted by side-lobe suppressing weights 
an = [ao, a1, 41, 41, a2, ..., aN], expressed by the multiplication with a diagonal 


matrix diag{an}, and then decoded to the L loudspeaker signals x by a sampling 
decoder 


D=,/* [yn(@1), ---) yO] = JE YF, (4.36) 
using 
x = Ddiag{ay} xn. (4.37) 


In total, the system for encoding Eq. (4.35) and decoding Eq. (4.36) can also be 
written to yield loudspeaker gains for one signal 


g = Ddiag{ay} yy (9s), (4.38) 


or in particular for the 3D sampling decoding g = 2 bad diag{ay} yy (0). 
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4.9 Ambisonic Decoding to Loudspeakers 


Ambisonic decoding to loudspeakers has been dealt with by numerous researchers, 
in the past, particularly because result are not very stable for first-order Ambisonics, 
and later because they strongly depend on how uniform the loudspeaker layout is for 
higher-order Ambisonics. Moreover, Solvang found that even the use of too many 
loudspeakers has a degrading effect [35]. 

For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often 
cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel 
with max-rg [37] and pseudo-inverse decoding [10], also by Poletti [14, 38, 39]. 

What turned out to be the most practical solution, is the All-Round Ambisonic 
Decoding approach (AIIRAD) due to its feature of allowing imaginary loudspeaker 
insertion and downmix as described in the sections above, cf. [40]. It moreover does 
not have restrictions on the Ambisonics order, which for other decoders often yields 
poor controllability of panning-dependent fluctuations in loudness and directional 
mapping errors. 

The playable set of directions 9, or g is usually finite and discrete, and it is 
represented by the surrounding loudspeakers’ directions. The directional distribution 
of the surrounding loudspeakers is typically neither a t-design (with t > 2N + 1 in 
general, sometimes not even regular polygons with L > 2N + 2 loudspeakers for 
2D, in particular). In such cases, it is extremely helpful to be aware of the properties 
of the various decoder design methods. 


4.9.1 Sampling Ambisonic Decoder (SAD) 


The sampling decoder as introduced above is the simplest decoding method. 
For dimensions (D = 2) and three (D = 3), it uses the matrix Yy = [yy (01), ..., 
Yn (ðL)] containing the respective circular or spherical harmonics y\(@) sampled at 
the loudspeaker directions {9/}, 


D= 2 Yi, (4.39) 
with the circumference of the unit circle denoted as $; = 27 or the surface of the unit 
sphere written as Sy = 4x . The factor ,/ Da expresses that each loudspeaker synthe- 


sizes a fraction of the E measure on the circle or sphere of the surrounding directions. 
However, the sampling decoder would neither yield perfectly constant loudness and 
width measures, E, ||rg||, nor a correct aiming of the localization measure rg if the 
loudspeaker layout wasn’t optimal. For instance concerning loudness, for panning 
towards directional regions of poor loudspeaker coverage, sampling misses out the 
main lobe of the panning function, yielding a noticeably reduced loudness. 
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4.9.2 Mode Matching Decoder (MAD) 


The mode-matching method is used in [10, 39] and yields a fundamentally different 
decoder design. Its concept is to re-encode the gain vector g of the loudspeakers for 
any panning direction @, by the encoding matrix Yn = [yn(91), .--, Yn(@L)] for all 
loudspeaker directions {0;}. Ideally, the re-encoded result should match the encoding 
of the panning direction with sidelobes suppressed 


Yn g = diag{an} yy (Qs). 
Using the definition g = D diag{ay} yy (6,) of the panning gains, we obtain 


YnD diag{an} yy(O;) = diag{an} yn (Os), 


=> D=/5~YN(WnYn) (4.40) 


so that the decoder D is required to be right-inverse to the matrix Yy, i.e. Yn D = 
Yn Yi(Yn Vy = I, see Eq. (A.63) in Appendix A.4. For the inverse of Yy YI to 
exist, itis necessary to have at least as many loudspeakers as harmonics, i.e. L > (N + 
1)? with D = 3 or L > 2N + 1 for D = 2. However, this is not a sufficient criterion 
yet: In directions poorly covered with loudspeakers, the inversion will boost the 
loudness, so that the result is often numerically ill conditioned for (Yj Y! unless 
the loudspeaker layout is uniformly designed, at least. Mode matching decoding is 
ill-conditioned on hemispherical or semicircular loudspeaker layouts. The solution is 
equivalently described by the more general pseudo inverse Yi. which is right-inverse 
for fat matrices. 


4.9.3. Energy Preservation on Optimal Layouts 


For instance, for an order of N = 2, 2D Ambisonics should work optimally with a 
ring of 45° spaced loudspeakers on the horizon, a circular (2N + 1)-design, or for 
3D, a spherical (2N + 1)-design. On a t-design selected by t > 2N, the loudness 
measure E is panning-invariant, in general, 


E = |g? = yx(@s) diag(an} DTD diag(an} yn @s) = IIdiagtan} yy (8s)||7 = const. 
=I 


This is because a t > 2N-design discretization preserves orthonormality 


L 
[rO rho a0 = $ Y yO) yn) = 2 ¥n¥R=T, (441) 
l=1 
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which implies for the sampling decoder DTD = Da YnY N = I, and we notice the 
panning invariant norm of g (0) within its coefficients yy = diag{an} yn (0s) by the 
Parseval theorem f 8° (0) dO = ||yyl|?. The panning invariant E measure also holds 
for the mode-matching decoder using a t > 2N-design, as it becomes equivalent to 


a sampling decoder D = | SYNNYN! =] a yi So = S-i Yi Under 
these ideal conditions, both decoders are energy-preserving. 


4.9.4 Loudness Deficiencies on Sub-optimal Layouts 


For 2D layouts, Fig. 4.15 shows what happens if a decoder is calculated for a t > 
2N + 1-design with one loudspeaker removed: While, for panning across the gap, the 
sampling Ambisonic decoder (SAD) yields a quieter signal, moderate localization 
errors and width fluctuation, the mode-matching decoder (MAD) yields a strong 
loudness increase and severe jumps in the localization/width. MAD is therefore not 
very practical with sub-optimal layouts, SAD only slightly more so. 


4.9.5 Energy-Preserving Ambisonic Decoder (EPAD) 


To establish panning-invariant loudness for decoding to non-uniform surround 
loudspeaker layouts one can ensure a constant loudness measure E by enforcing 
D' D = I, which is otherwise only achieved on t > 2N-designs. We may search for 
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a decoding matrix D whose entries are closest to the sampling decoder under the 
constraint to be column-orthogonal: 


ID — y = YÑ llko > min (4.42) 


subject to DTD = I. 
The singular value decomposition of 
Yy = U [diag{s}, 0)’ V7 (4.43) 
can be used to create 
D =U |I, 0]' Vv". (4.44) 


by replacing the singular values s with ones. Such a decoder is column-orthogonal, 
as the singular-value decomposition delivers UTU = I and V V7 = I, and as a con- 
sequence! DTD = I. The energy-preserving decoder in this basic version requires 
L > 2N + 1 loudspeakers in 2D or L > (N + 1)? in 3D to work. 

Note that if the loudspeaker setup directions are already a t > 2N design, the 
sampling, mode-matching, and energy-preserving decoders are equivalent. 


4.9.6 All-Round Ambisonic Decoding (AlIlIRAD) 


In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result 
in terms of loudness, width, and localization was achieved by MDAP that distributes 
a signal to an arrangement of several superimposed VBAP virtual sources. Hereby 
E = const., rg © rg ĝs, and rg œ% const. This works for nearly any loudspeaker lay- 
out. 

While, to calculate loudspeaker gains, MDAP superimposes an arrangement of 
discrete virtual sources within a range of +œ around the panning direction 0,, one 
could also think of superimposing a quasi-continuous distribution of virtual sources 
that are weighted by a continuous panning function g(@). 

The ideal continuous panning function g(@) of axisymmetric directional spread 
around the panning direction 0, is described by g(0) = yu (0) diag{an} yy (05), the 
Ambisonic panning function. This rotation-invariant continuous function is optimal 
in terms of loudness, width, and localization measures, which are all evaluated by con- 
tinuous integrals: E = f g°(0) d0 = const. expresses panning-invariant loudness, 
rg = t f 8° (0) 0 d0 = re 0; indicates a perfect alignment reg || 6, with the panning 
direction and a panning-invariant width rg = const. However, the optimal values of 


1n detail, this follows from DTD = V[I, OWO[, 0T VT = VUA OT VT = VVT = 1. 
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these integrals are only preserved by discretization with optimal t > 2N + |-design 
loudspeaker layouts. 


All-round Ambisonic decoding (AllRAD) is preceded by the work of Batke and 
Keiler [16]. They describe Ambisonic panning 2 ayjpap(@) = D yx(@) by a decoder 
D, whose result matches best with VBAP gypap(@). Without max-rg weights yet, 
we use this here to define AIIRAD by the integral expressing a minimum-mean- 
square-error problem using the integral over all panning directions 0 


min [ “| evaar®) — D yx) |? 40. (4.45) 


Equivalently, as described by Zotter and Frank [40] who coined the name, we may 
define AIIRAD as VBAP synthesis on the physical loudspeakers when using as 
multiple-virtual-source inputs the Ambisonic panning function gampi(0) = y4 (0) 
diag{an} yy (0) sampled at an optimal layout of virtual loudspeakers. Here, we write 
the synthesis as the integral over infinitely many virtual loudspeakers 0, 


g= fev gamei (0) dé = f Evgar (0) Yn (0) diag{an} yn (0s) dO 


= f gvpap(0) YEO) d0 diag{an} yn (0.). 


T pÝ 
:=D 


We can obviously pull the term diag{an} yn (0s) out of the integral. The remaining 
integral defines the AIJRAD matrix D. We may interpret it as a transformation of the 
VBAP loudspeaker gain functions gygap(0) into spherical harmonic coefficients. In 
the original paper [40], AIIRAD is evaluated by an optimal layout of discrete virtual 
loudspeakers 


L 
A a A aT 
D= J Evgar (0) yy(8) dO = $= $ gvsa ÔD NÂ = S= GYy, (4.46) 
1=0 


using the directions (6)} of a t-design. As VBAP’s gain functions aren’t smooth 
(derivatives are non-continuous), they are order-unlimited, and a t-design of suf- 
ficiently high t should be used. In the 3D practice, the 5200 pts. Chebyshev-type 
design from [33] is dense enough. Note that the VBAP part permits improvements 
by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or 
hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3. 

Note that the decoder needs to be scaled properly. For instance, the norm of the 
omnidirectional component (first column) could be equalized to one, as it would 
typically be with a sampling decoder; there are alternative strategies to circumvent 
the scaling problem [41]. 
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4.9.7 EPAD and AIlRAD on Sub-optimal Layouts 


Figude 4.16 shows the improvement achieved with EPAD and AlIRAD on an equi- 
angular arrangement that is suboptimal by the missing loudspeaker at —90°. Both 
decoders manage to handle either the loudness stabilization perfectly well (EPAD) 
or keep the directional and spread mapping errors small (AlIRAD). We notice that 
for EPAD, with the constraint that L > (2N + 1) just fulfilled for N = 3 and L = 7 
of the simulation, it would not simply be possible to remove any further loudspeakers 
without degradation. 


4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts 


In typical loudspeaker playback situations for large audience, a solid floor and no 
loudspeakers below ear level are considered practical for several reasons. However, 
this does not permit decoding by sampling with optimal t-design layouts covering 
all directions. As shown above, EPAD and AIJIRAD do not require such arrays. And 
yet, they still require some care when used with hemispherical loudspeaker layouts, 
see [15, 40] for further reading. 


EPAD with hemispherical loudspeaker layouts. Even for a hemispherical layout, the 
energy-preserving decoding method requires L > (N + 1)? loudspeakers to achieve a 
perfectly panning-invariant loudness. However, this is counter-intuitive: Why should 
one need at least as many loudspeakers on a hemisphere as are required for same- 
order playback on a full sphere? Shouldn’t the number be half as many? 
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Table 4.1 Integration ranges 0 < } < Umax to obtain (N + 1)(N + 2)/2 Slepian functions with 
minimum loudness fluctuation =““£ for panning on the hemisphere 


minE 


0.5 dB | 0.4 dB 0.3 dB | 0.3 dB 0.2 dB | 0.3 dB 


We can show that while the spherical harmonics are orthonormal on the sphere 
S?, i.e. fea yn (0) y; (0) dé = I, they aren’t orthogonal on the hemisphere S = S? : 
v < max 


I yn(0)y (0) dd =G. (4.47) 
S 


Here, G is called Gram matrix, and itis evaluated by t Žo, >20 Yn (87) Yn (8)) using a 
high-enough ft-design. By singular-value decomposition of the positive semi-definite 
matrix G = Q diag{s} Q", with QTQ = Q Q" = I, wediagonalize G and find new 
basis functions yy (6), the so-called Slepian functions [42], that are orthogonal on S 


"GQ = diag{s) = f OTYNOYEO Qdo, => SnO) = QTY). 


Typically, the singular values in s are sorted descendingly sı > s2 >--- > S(N+1) 
so that it is possible to cut out basis functions of significantly large contribution to 
the upper hemisphere S by 


jy (O) = [I, 0] OT yy (0). (4.48) 


Typically, the numerical integral is extended to slightly below the horizon, see 
Table 4.1, so that truncation to the (N + 1)(N + 2)/2 most significant basis func- 
tions, see Fig.4.17, produces a minimum fluctuation in the loudness measure 
E= || 9x (@)||? for panning on the hemisphere. 

With yy (0), EPAD is calculated in the same way as for the ordinary harmonics 


¥,, = Ofdiag{s}, 0]10", d=, oV'" (4.49) 
with the main difference that the lower limit for the number of loudspeakers decreases 
to L > (N + 1)(N + 2)/2. Interfaced to the spherical harmonics by [J, 0] QT, the 


hemispherical energy-preserving decoder becomes 


D = D [1, 0] Q". (4.50) 
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Fig. 4.17 Slepian basis functions for the upper hemisphere, composed of the spherical harmonics 
up to 3rd order, using 0° < # < 113° as integration interval and (N + 1)(N + 2)/2 functions. To 


get these nice shapes, the Slepian functions were found separately for every degree m where they 
were moreover de-mixed by QR decomposition 


AIlRAD with hemispherical loudspeaker layouts. Because of the vector-base 
amplitude panning involved, all-round Ambisonic decoding (AIIRAD) is compara- 
tively robust to irregular loudspeaker setups. Still, a hemispherical layout does not 
contain any loudspeaker direction vector pointing to the lower half space, therefore 
one could just omit information of the lower half space. However, the Ambisonic 
panning function implies a directional spread, so that panning to exactly the horizon 
also produces content below, whose omission causes: (i) a loss in loudness, (ii) a 
slight elevation of the perceived direction, cf. Fig. 4.18. 

As discussed in the section on triangulation Sect.3.3, the insertion of imaginary 
loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts, 
it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to 
stabilize both loudness and localization for panning to the horizon. 

Signal contributions below but close to the horizon largely contribute to the hor- 
izontal loudspeakers, and it is therefore safe to dispose the signal that would feed 
the imaginary loudspeaker at nadir without loss of loudness. Moreover, this con- 
tribution from below also reinforces signals on the horizontal loudspeakers so that 
localization is pulled back down. Both can be observed in Fig. 4.18 that shows the 
loudness measure E as well as mislocalization and width by the measure rg using 
max-rg-weighted AIIRAD with 5th-order Ambisonics along a vertical panning cir- 
cle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers 
set up in rings of 8, 8, 4, 4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation. 
Rings two and four start at 0 degree, the others are half-way rotated. 


Performance comparison on hemispherical layouts. Figure 4.18 shows a compari- 
son of AIIRAD and EPAD decoding to the 25-channel mAmbA hemispherical loud- 
speaker layout. 
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Fig. 4.18 Perceptual measures for 5th max-rg-weighted AlIRAD on the IEM mAmbA layout [43] 
with (black) and without insertion of the bottom imaginary loudspeaker (black dotted) whose signal 
is disposed, and max-rg-weighted EPAD (gray), for panning on a vertical circle: E in dB (top), 
orientation error of rg in degrees (middle), and width expressed by arccos ||rg || in degrees (bottom). 
The thin dashed line shows AlIRAD without imaginary loudspeakers 


While (top in Fig. 4.18) AIIRAD produces a loudness fluctuation roughly spanning 
1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in 
Table 4.1. While in monophonic playback of noise, loudness differences of less than 
0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation 
of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should 
be no problem with both EPAD and AIIRAD. 

Concerning the directional mapping, EPAD produces a more strongly pronounced 
ripple, with rg indicating sounds on the horizon 3, = +90° to be pulled upwards 
towards 0° more with EPAD (7°) than with AIIRAD (3°). In terms of width, both 
EPAD and AIIRAD exhibit the ~20° average associated with max-rg weighting. 
However, EPAD also produces a greater fluctuation, and it widens up to about 30° 
degree for panning to the horizon ®, = +90°. 

With the 9 loudspeakers of the ITU [44] 4+5+ 0 layout (horizontal ring: 
g = 0, +30°, +120°, upper ring at 40° elevation with g = +30°, +120°), it is not 
possible anymore to use EPAD with 5th order, which would be the optimal resolu- 
tion for the front loudspeaker triplet. EPAD only supports orders up to N = 2, and to 
lose level towards below-horizon directions, we can use the reduced set of 6 Slepian 
functions; alternatively all 9 spherical harmonics of N = 2 would also be thinkable. 
For AllRAD, imaginary loudspeakers are inserted at the sides at azimuth/elevation 
+75°/27°, up at 0°/78°, back 180°/35°, and below 0°/ — 90°. It is reasonable to 
downmix the imaginary loudspeakers with a factor one for up, sides, back, and 
re-normalize the VBAP gain matrix, while disposing the signal of the imaginary 
loudspeaker below. AIIRAD permits to use the order N = 5, which resolves the 
frontal loudspeaker triplet much better for horizontal panning. 

Figure 4.19 shows the result of max-rg-weighted 2nd-order EPAD and 5th-order 
AIIRAD for the 4 + 5 + 0 layout using a vertical panning curve. While the perfectly 
constant loudness measure of EPAD might be favored over the almost +3 dB loudness 
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Fig. 4.19 Perceptual measures for Ambisonic panning on the ITU [44] 4+5+ 0 layout with 
insertion and downmix of imaginary loudspeakers at the sides, back, and top, and insertion and 
disposal of an imaginary loudspeaker below for AlIRAD. Measures are evaluated for max-rg- 
weighted 5th-order AIIRAD (black) and 2nd-order EPAD (gray), for panning on a vertical circle: 
E in GB (top), orientation error of rg in degrees (middle), and width expressed by arccos ||rg|| in 
degrees (bottom) 


increase of front and back for AIIRAD, AlIJRAD’s lower directional error, narrower 
width mapping, greater flexibility, and simplicity has often proven to be clearly 
superior in practice. 


4.10 Practical Studio/Sound Reinforcement Application 
Examples 


This section analyzes the application of 3D Ambisonic amplitude panning consisting 
of encoding and AIIRAD to studio (with typical setups of 2 m radius) and sound rein- 
forcement applications (for an audience of, e.g., 250 people). Application scenarios 
are sketched in [43], and various other examples are given below. Requirements of a 
constant loudness and width are analyzed below, and as sound reinforcement requires 
a particularly large sweet area, the rg vector model for off-center listening positions 
from Sect. 2.2.9 is used to depict the sweet area size. 

The analysis of decoders above described loudness measures for panning on a 
circle. To observe them with panning across all directions in Figs.4.20 and 4.22, 
world-map-like mappings using a gray-scale representation of the loudness and width 
measures are more reasonable. For several loudspeaker layouts, its axes are azimuth 
horizontally and zenith vertically, and the gray-scale map displays the loudness mea- 
sure E in dB (left column) and the width measure arccos ||rg|| in degrees (right 
column). As 5th-order max-rg-weighted AIIRAD typically produces minor direc- 
tional mapping errors, they aren’t explicitly shown in Figs. 4.20 and 4.22. However, 
the mappings of the sweet area size of plausible localization in Figs.4.21 and 4.23 
illustrate the usefulness of the systems for the listening areas hosting the number of 
listeners targeted for either the studio or the sound reinforcement application. 

Figure 4.20 illustrates AlIRAD’s tendency of attenuated signals in too closely 
spaced loudspeaker ensembles as in the front section of the ITU [44] 4+ 5 + 0. By 
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(b) IEM Production Studio (4 listeners) 


Fig. 4.20 Comparison of 5th-order max-rg-weighted AIIRAD for panning across all directions on 
hemispherical loudspeaker layouts in studios. The left column the loudness measure E in dB and 
the right-most column the width measure arccos ||rg|| in degree, and the loudspeaker position are 
marked with a white + sign 
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Fig. 4.21 Comparison of the calculated sweet are size for 5th-order max-rg-weighted AIIRAD 
for panning across all directions on hemispherical loudspeaker layouts in studios. As a plausibility 
definition, the directional mapping errors depending on the listening position should stay within 
angular bounds (e.g. 10°) 


contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the 
horizon, and signals panned to the largely spaced below-horizon triangles tend to get 
louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM 
CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and 
width mappings. Still, also with only a few loudspeakers, slight direction adjustment 
in the layout can fix some of the behavior, as with the IEM Production Studio, whose 
+45° loudspeakers in the elevated layer is superior to a +30° spacing. 
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Fig. 4.22 Comparison of Sth-order max-rg-weighted AIIRAD for panning across all directions on 
various hemispherical loudspeaker layouts for sound reinforcement. The left column the loudness 
measure E in dB and the right-most column the width measure arccos ||rg|| in degree, and the 


loudspeaker position are marked with a white + sign 


A hint for designing good decoders sometimes is idealization: often it is better 
to disregard the true loudspeaker setup locations and feed the decoder design with 
idealized positions instead. Hereby can one trade slight directional distortions for 
a more uniform loudness distribution. For instance at the IEM CUBE, loudspeaker 
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Fig. 4.23 Comparison of the calculated sweet are size for Sth-order max-rg-weighted AIIRAD for 
panning across all directions on various hemispherical loudspeaker layouts for sound reinforcement. 
As aplausibility definition, the directional mapping errors depending on the listening position should 
stay within angular bounds (e.g. 10°) 
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locations of the horizontal ring could be idealized to 30° to get a smoother loudness 
mapping as the one shown in Fig. 4.22. 


4.11 Ambisonic Decoding to Headphones 


Typically, Ambisonic decoding to headphones can be done similarly as with loud- 
speakers, except that the loudspeaker signals are rendered to headphones by convolu- 
tion with the head-related impulse responses (HRIRs) of the corresponding playback 
directions. Various databases of such HRIRs can be found, e.g., on the website SOFA- 
conventions.” This headphone decoding approach is classically using a small set of 
so-called virtual loudspeakers, as it is found in many places in technical literature, 
e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is 
relevant in many important other works [18, 45, 46], the SADIE project, and it is 
employed in Sect. 1.4.2 on first-order Ambisonics. 


Coarse. However, as outlined in some research papers [9, 18, 46], these approaches 
have in common that low-order Ambisonic synthesis is problematic. It can either 
happen when inserting a dense grid of virtual-loudspeaker HRIRs that the Ambisonic 
smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had 
been the solution for a long time, a coarse grid of virtual-loudspeaker HRIRs does 
not attenuate high frequencies, but still yields that spatial quality strongly depends 
on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed 
to remove the time delays of the HRIR before Ambisonic decomposition, and then 
to re-insert the otherwise missing interaural time-delay afterwards, for any sound 
panned in Ambisonics, which unfortunately yields an object-based panning system 
rather than a scene-based Ambisonic system. 


Dense. Some dense-grid approaches propose to keep the HRIR time delays, or if 
formulated in the frequency domain: the HRTF phases (head-related transfer func- 
tion), and hereby stay in a scene-based Ambisonic format, while correcting spectral 
deficiencies by diffuse-field or interaural-covariance equalization [18, 47]. Finally, 
most recent solutions proposed by Jin, Sun, and Epain, [17, 48] or Zaunschirm, 
Schorkhuber, and Höldrich [20, 21] modify the HRIR time delays/HRTF phases but 
only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omis- 
sion of high-frequency interaural time-delay/phase information is a reasonable trade 
off done in favor of a more important accuracy in spectral magnitude. 


What does directional HRIR smoothing do to high frequencies? The geometrical 
theory of diffraction [49] suggests that HRIRs must always contain at least the delay 
to the ear of either the shortest direct path or the shortest indirect path via the surface 
of the head. For a spherical head model with the radius R = 0.0875 m and speed 
of sound c = 343 2, the Woodworth-Schlosberg formula [50] is composed of this 


7https://www.sofaconventions.org. 
3https://www.york.ac.uk/sadie-project/database.html. 
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Fig. 4.25 Time delay to the ear depending on azimuth of horizontal sound (a) and (b) 360°-measured 
KU100 dummy head HRIRs set from TH Köln (color displays dB levels) 


consideration, see Fig.4.24. The left ear receives a distant horizontal sound from 


the azimuth interval 0 < ¢ < 5 as direct sound anticipated by T = -È sin ġ, or for 
—5 < @¢ < 0 as an indirect sound delayed by t = -È d, 
t(¢) = —® min{sing, $}, (4.51) 


as plotted in Fig.4.25a, and recognizable from dummy-head measurements’ in 
Fig. 4.25b. 

If the HRIR is smoothed across an angular range, the time-delay curve gets spread 
across time as well, see Fig. 4.26. In this way, depending on whether the smoothing 
uses a continuous or discrete set of directions, one either obtains something like a 
comb filter or a sinc-shaped frequency response. This smoothing is least disturb- 
ing around the direct-ear side as shown left in Fig.4.26, and, as the indirect ear 
also encounters high-frequency shadowing effects, it is most disturbing mainly for 
frontal and rear sounds at 0° or 180°, as shown right in Fig. 4.26. The correspond- 
ing frequency responses are roughly exemplified with what third-order Ambison- 
ics equivalent smoothing would do to either 45°-spaced HRIRs in Fig.4.27a or 
15°-spaced ones in Fig. 4.27b. 


4Data HRIR_CIRC360.sofa from http://sofacoustics.org/data/database/thk. 
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Fig. 4.26 Directionally smoothed playback to multiple HRIRs within a window, e.g. 30°, causes 
different impulse response shapes at 90° and 0° 
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Fig. 4.27 Differences of directionally smoothed HRTF frequency responses for a horizontal sound 
from either 0° and 90°, smoothed within a +22.5° window, which roughly corresponds to the 
Ambisonics order N = 3; a and b either use a grid of 45° or 7.5° spaced HRIRs. The dashed line 
shows the the theoretical frequency limit 1.87 kHz for N = 3 


To get an upper frequency limit, it is insightful to work in the frequency domain 
where the HRIR is denoted head-related transfer function (HRTF). A simplified 
linearized-phase version around ġ = O uses t ~ R ġ, and the resulting in the Fourier 
transform with œ = 27 f is 


H x eRT) — gicRo (4.52) 


To represent it by circular or spherical harmonics transformation limited to the order 
N, a maximum phase change represented by the harmonic ei? implies that we can 
only resolve the phase up to R < N, hence the range of accurate operation is limited 
in frequency 


fn < SR =N- 624Hz. (4.53) 


As high-frequency HRTF phase evolves more rapidly over the angle as what the 
finite order can represent, this typically yields attenuation of the high frequencies 
when obtaining circular/spherical harmonics coefficients by transformation integral. 

Directional smoothing of the discrete directional HRTFs causes relevant spec- 
tral problems, regardless of whether directional smoothing is done by Ambisonics, 
VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerg- 
ing comb-filter or low-pass behavior. One could pull out the linear phase trend above 
the frequency limit and re-insert it, but is re-insertion necessary? 
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4.11.1 High-Frequency Time-Aligned Binaural Decoding 
(TAC) 


As a pre-requisite for their binaural Ambisonic decoders, Schorkhuber et al. [21] 
tested, above which frequency the removal of the HRTF linear phase trend remains 
inaudible in direct HRTF-based rendering without panning or smoothing. In fact, 
most of their listeners could not distinguish the absence of the linear phase trend 
when removed above 3 kHz for various sound examples (drums, speech, pink noise, 
rendered at directions 10°, —45°, 80°, — 130°). They had their subjects compare the 
result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28. 

By this finding, it is possible to split up each of the 2 x 1 HRIRs h(t, 0) into 
an unaltered low-pass band and a time-aligned high-pass band to unify the high- 
frequency HRIR delay 


(4.54) 


i = hiet>3kHz<[t — T (arcsin 6y), 0] 
Sa a Pa + t (arcsin Oy), 0] |` 


The time delay model t (ġ) uses the angle to the left/right ear on the positive/negative 
y axis, so arccos +6,, but shifted by 90°, hence @ = = arcsin 6y. 

This removal allows use all available HRIRs of dense measurement sets for bin- 
aural synthesis of high accuracy, using a suitable linear Ambisonic decoder such as 
AlIRAD. Assuming the resulting modified left and right HRIR for all directions are 
denoted as 2 x L matrix A(t) = [A(t, 01), ..., A(t, 0)]", the 2 x (N +1)? filter 
set for decoding every of the Ambisonic channels to the ears becomes: 


Ag,(t) = A(t) Ddiagta}. (4.55) 
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Fig. 4.28 Experiment on audibility of removal of the linear phase trend from HRTF above a varied 
cutoff frequency from [21] showing medians and 95% confidence intervals 
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Fig. 4.29 Exemplary horizontal cross-sections of linear (lin), time-aligned (ta), and 
MagLS/magnitude-least-squares (mls) of third order N = 3, compared to high-order N = 35 (max) 
Ambisonic left-ear HRTF representations of the TH Cologne HRIR_L2720.sofa set 


Results achieved by a pseudo-inverse decoding to hereby time-aligned HRIRs using 
R = 0.085 cm with N = 3 from the 2702-directions Cologne HRIRs* is shown in 
Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposi- 
tion (lin) at frequencies above 2kHz in representing the original HRTFs (max). 


4.11.2 Magnitude Least Squares (MagLS) 


Alternative to high-frequency time delay disposal, Schérkhuber et al. present an 
optimum-phase approach [21] that disregards phase match in favor of an improved 
magnitude match above cutoff. Formulated exemplarily for the left ear, across every 
HRTF direction 0;, and for every discrete frequency w,, with h; = h(®, œx), this 
becomes 


L 
A 2 
min J [ly @)" sual — ral] (4.56) 


Nsuk j=] 


Typically, one would need to solve magnitude least squares or magnitude squares 
least squares tasks with semidefinite relaxation, see Kassakian [51]. 

In practice, however, results turn out to be perfect already with an iterative com- 
bination of the reconstructed phase ĝi. k—1 from the previous frequency wg—ı with the 
HRTF magnitude |A | of the current frequency œg, before a linear decomposition 
thereof into spherical harmonic coefficients hawk: 

Every frequency below cutoff w < 27 fy just uses the linear least-squares spher- 
ical harmonics decomposition with the left-inverse of the spherical harmonics Yy 
sampled at the HRTF measurement nodes, 


5Data HRIR_L2702.sofa from http://sofacoustics.org/data/database/thk. 
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hsy, = (YNYn) YN [Ave], - (4.57) 


Continuing with the first frequency above/equal to cutoff @ > 27 fy, the algorithm 
proceeds as: 


dbix—1 = Z{ yn (0)" Asn} , (4.58) 
hsna = VRYw) YT [Ih] i], (4.59) 


and then moves to the next frequency k < k + 1. The results are typically trans- 
formed back to time domain to get a real-valued impulse response for every spherical 
harmonic to the regarded ear. 

The results of the MagLS approach (mls) outperform the time-alignment approach 
(ta) in the exemplary results shown for N = 3 in Fig. 4.29, in particular at the high- 
est frequencies, where sphere-model-based delay simplification is not sufficiently 
helpful, anymore. 


4.11.3 Diffuse-Field Covariance Constraint 


Also for both the above approaches that modify the high-frequency phase, Zaun- 
schirm et al. [20] note that low order rendering degrades envelopment in diffuse fields, 
so that they introduce an additional covariance constraint as defined by Vilkamo [22]. 
It can be implemented as a 2 x 2 filter matrix equalizing the resulting frequency- 
domain diffuse-field covariance matrix to the one of the original HRTF datasets. 
On the main diagonal, this covariance matrix shows the diffuse-field ear sensitiv- 
ities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross 
correlation. 

At every frequency, the 2 x 2 diffuse-field covariance matrix of the original, 
very-high-order spherical harmonics HRTF dataset H a of the dimensions 2 x (M + 
1)? with (M > N) is given by 


R = Ag, su. (4.60) 
The derivation why this inner product of spherical harmonic coefficients represents 
the diffuse-field covariance is given in Appendix A.5. The low-order high-frequency 


modified HRTF coefficient set H sy of the dimensions 2 x (N + 1)? also has a2 x 2 
covariance matrix R that will differ from the more accurate R, 


aA aH a 
R = Ay, Hsu. (4.61) 


Its diffuse-field reproduction improves after equalizing R = R by a 2 x 2 filter 
matrix, 
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Fig. 4.30 Covariance constraint filters enhance the binaural decorrelation of MagLS by negative 
crosstalk M12 and M21, under corresponding correction of the diffuse-field sensitivities Mı and 
Mo at playback orders N < 3 


Asuicom = HsuM. (4.62) 


Appendix A.5 shows the derivation of M based on [20, 22]. In summary, it is com- 
posed of factors obtained by Cholesky and SVD matrix decompositions 


Hoon. = Hsu X | VU"X, (4.63) 
Cholesky factors: SVD: 
aH 
H}; Hsy = X"X, X X=USV", 
aH a aHa 
H yHsu = X X. 


While MagLS binaural decoding with orders higher than 2 or 3 does not require 
covariance correction, the correction enhances the decorrelation of the ear signals 
for 1st to 2nd order reproduction, as shown in Fig. 4.30. 


4.12 Practical Free-Software Examples 


4.12.1 Pd and Circular/Spherical Harmonics 


Similar as in the example section on first-order encoding and decoding in pure data 
(Pd), Fig. 4.31 shows 3rd-order 2D Ambisonic encoding and decoding for an octagon 
loudspeaker layout. The implementation [mtx_circular_harmonics] of the circu- 
lar harmonics is used from the iemmatrix library, and the numbers for 180 = 57.29 
and am = COS z NID were pre-calculated. Note the similarity to the first-order 2D 
example of Fig. 1.13, to which the main change is the use of the circular harmonics 
matrix object. 

For decoding to headphones, programming in Pd also looks rather similar as 
in the first-order example in Fig. 1.14, only more HRIRs matching the respective 
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Fig. 4.31 2D encoding and decoding in Pd using [mtx_circular_harmonics] with 3rd 
order, 8 equidistant loudspeakers, and max-rg weighted decoder 


loudspeaker positions need to be employed. To work in 3 dimensions, programming 
in Pd would also be similar as in the corresponding first-order example of Fig. 1.15, 
using the matrix object [mtx_spherical_harmonics]. Typically, pre-calculated 
decoders including AlIRAD and max-rg are used and loaded by, e.g., [mtx D.mtx] 
into Pd to keep programming simple. 


4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM 
AIIRA Decoder 


For encoding single- or multi-channel signals into Ambisonics, there are the 
ambix_encode_o<N>, or ambix_encode_i<L>_o<N> VST plugins available from 
Kronlachner’s ambix plugin suite or the IEM MultiEncoder from the IEM plu- 
gin suite. As exemplarily shown in Fig. 4.32, the multi encoder allows to encode 
channel-based multi-channel audio material, where channel-based [52] typically 
refers to each channel of the multi-channel material meant to be played back on a 
separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding 


of virtual playback directions can also be found referred to as beds or virtual panning 
spots. 
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Fig. 4.33 All RADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio 
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Fig. 4.35 Al1RADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio 
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The IEM Al1RADecođer permits to manually enter or import the loudspeaker 
coordinates and channel indices, with the coordinates specified by the azimuth and 
elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33. 
The figure also shows that just entering the pure 5 + 7 + 0 layout would produce 
an error message Point of origin not within convex hull. Try adding imaginary loud- 
speakers. 

By adding an imaginary loudspeaker below whose signal is typically omitted, 
see Fig. 4.34, it becomes geometrically valid to calculate and employ the resulting 
decoder, however it is better to also insert an imaginary loudspeaker at the rear whose 
signal is preserved by specifying the gain value 1, as shown in Fig. 4.35. 


4.12.3 Reaper, IEM RoomEncoder, and IEM 
BinauralDecoder 


Particularly relevant for head-phone-based listening, rendering of anechoic sounds 
will typically not externalize well, as it does not match the mental expectation of 
ordinary listening environments [53-56]. To avoid that this would rather cause an 
in-head localization than the desired external sound image, one can, e.g., use the 
IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model 
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Fig. 4.36 RoomEncoder plug-in 
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Fig. 4.37 BinauralDecoder plug-in 


and encodes first-order wall-reflections involving reflection factors and propagation 
delays together with the desired direct sound. 


The MagLS approach for Ambisonic decoding, using the KU100 measurements 


from Cologne Applied Science University and (optionally) their headphone equal- 
ization curves is implemented by the IEM BinauralDecoder, see Fig. 4.37. 


In combination of both, IEM RoomEncoder and IEM BinauralDecoder with 


an Ambisonics-encoded single-channel sound (e.g. using ambix_encoder), one can 
simply try to place the source and receiver together in the symmetry plane of the 
room, and then to slightly shift one of both sideways to see how externalization 
improves by slight asymmetry in the ear signals. 
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Chapter 5 A) 
Signal Flow and Effects in Ambisonic ciecie; 
Productions 


This system offers significantly enhanced spatialisation 
technologies [...] with new creative possibilities opening up to 
anyone with the appropriate number of audio channels 
available on their computer systems. 


Dave G. Malham [1], ICMC, Bejing, 1999. 


Abstract This chapter presents the internal working principles of various Ambisonic 
3D audio effects. No matter which digital audio workstation or processing soft- 
ware is used in a production, the general Ambisonic signal infrastructure is out- 
lined as an important overview of the signal processing chain. The effects presented 
are frequency-independent effects such as directional re-mapping (mirror, rotation, 
warping) and re-weighting (directional level modification), and frequency-dependent 
effects such as widening/distance/diffuseness, diffuse reverberation, and resolution- 
enhanced convolution reverberation. 


The typical audio processing steps for Ambisonic surround-sound signal 
manipulation are shown in the block diagram Fig. 5.1 from [2]. The description of 
the multi-venue application in [3] and one for live effects [4] might be encouraging. 


Ambisonic encoding and Ambisonic bus. From the previous section we know that 
representing single-channel signals se (t) together with their direction 8, is a matter 
of encoding, of multiplying the signal by the coefficients yy (0@,) obtained by evalu- 
ating the spherical harmonics at the direction from which the signal should appear to 
come. In productions, there will be multiple signals, either representing spot micro- 
Phones, virtual playback spots of embedded channel-based content (beds), e.g. stereo 
or 5.1, material. With all input signals encoded and summed up on an Ambisonic 
bus, we obtain the multi-channel Ambisonic signal representation of an entire audio 
production 
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Fig. 5.1 Block diagram as in [2] 


Cc 
Xn(t) = >) Yc) se(t). (5.1) 


cal 


Ambisonic surround-sound signal. Without decoding to a specific loudspeaker lay- 
out, the signal Xy of the Ambisonic bus might appear somewhat virtual. Nevertheless, 
it allows to be drawn as a surround-sound signal x (6, t) whose amplitude can be eval- 
uated and metered at any direction 0, anytime f, using the expansion into spherical 
harmonics 


x(0, t) = YRO) Xn(t). (5.2) 


Upmixing. As first-order recordings are not highly resolved, there are several works 
on algorithms with resolution enchancement strategies that re-assign time-frequency 
bins more sharply to directions. A good summary on such input-specific insert effects 
has been given in the book [5, 6]. Available solutions are DirAC, HOA-DirAC, 
COMPASS, Harpex. 


Higher order. Higher-order microphones require more of the acoustic holophonic 
and holographic basics than presented above, yielding pre-processing filters as input- 
specific insert effect. Higher-order recording is dealt with in the subsequent Chap. 6 
after the derivation of the wave equation and the solutions in the spherical coordinate 
system. 


Insert effects: Generic re-mapping and leveling. One can imagine that it should 
be possible to manipulate the surround-sound signal x(0, t) in various ways. For 
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instance, effects based on directional re-mapping can take signals out of their original 
directional range and place them back into the Ambisonic signal at manipulated 
directions. Also, directions can be altered in amplitude levels so that, for instance, 
signals at directions with unwanted content undergo attenuation. Many more useful 
effects are presented below. 


Decoding to loudspeakers/headphones. To map the modified Ambisonic signal x, 
to loudspeakers or headphones, an Ambisonic decoder is needed as discussed in 
the previous chapter. For decoding to headphones, it should be considered to either 
take only as few HRIR directions to decode to as possible [7, 8], before signals 
get convolved and mixed to avoid coloration at frontal directions where delays 
in the HRIRs change too strongly over the direction to get resolved properly [9, 
10]. Alternatively, the approach in [11] proposed removal of the HRIR delay at 
high frequencies and diffuse-field covariance equalization by a 2 x 2 filter system, 
cf. Sect. 4.11. 


5.1 Embedding of Channel-Based, Spot-Microphone, 
and First-Order Recordings 


Microphone arrays for near-coincident higher-order Ambisonic recording based on 
holography will be discussed in the subsequent chapter. Nevertheless it possible to 
use (i) spot and close microphones and encode their direction into the directional 
panorama, (ii) first-order microphone arrays to fill the Ambisonic channels only up 
to the first order, (iii) more classical non-coincident or equivalence-stereophonic 
microphone arrays whose typical playback directions are encoded in Ambisonics. 

The study by Kurz etal. [12] investigated how recordings by first-order encoding of 
the soundfield microphone ST450 and the Oktava MK4012 tetrahedral microphone 
arrays compare to the equivalence-stereophonic ORTF, see Fig. 5.2. In addition, 
ORTF-like mapping of the Oktava MK4012’s frontal signals to the +30° directions in 
5th order was tested instead of its first-order encoding. Figure 5.3 shows the results of 
the study in terms of the perceptual attributes localization and spatial depth. It seems 
that a mixture between ORTF-like 5th-order encoding and first-order encoding of 
the MK4012 microphone achieves preferred results, while the first-order encoded 
output of the ST450 Soundfield microphone is rated fair in both attributes, the ORTF 
microphone only ranked well terms of localization. The results of the ST450 were 
independent from its orientation, whereas the localization of the first-order-encoded 
MK4012 was found to be dependent on the orientation. This dependency of the 
MK4012 is because its microphones are not sufficiently coincident. 

As a bottom line of the detailed analysis, one should be encouraged to keep using 
classical microphone techniques where known to be appropriate and encode their 
output in higher-order beds or virtual playback directions. However, this should be 
done with the awareness that stereophonic recording won’t necessarily work for a 
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Fig. 5.2 Ensemble of the Ambisonic and reference microphones of the study by Kurz et al. [12]; 
the pixelized microphone prototype by AKG was excluded from the study 
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Fig. 5.3 Median values and 95% confidence intervals for each attribute from experiments in [12] 
for different microphones, orientations, and playback processing 


large audience area, for which the robustness in directional mapping of equivalence- 
based techniques seem to be attractive. 

Aninteresting layoutis, e.g., specified in Hendrickx et al’s work [13], in which they 
use an equivalence-stereophonic six-channel microphone array. Another interesting 
idea was used in the ICSA Ambisonics Summer School 2017. A height layer of 
suitably inclined super-cardioid microphones was added at small vertical distance 
to the horizontal microphone layer, similarly as the upwards-pointing directional 
microphones suggested in Lee’s and Wallis’ work [14, 15] to provide sufficiently 
attenuated horizontal sounds to the height layer. 
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Fig. 5.4 Median values and 95% confidence intervals of listening experiment comparing channel- 
based orchestra recordings on headphone playback, either directly rendered using the corresponding 
HRIRs or via binaural Ambisonic decoding of different orders 


Binaural rendering study using surround-with-height material. In another study 
by Lee, Frank, and Zotter [16], static headphone-based rendering of channel-based 
recordings was compared using direct HRIR-based rendering or Ambisonics-based 
binaural rendering, cf. Sect. 4.11. The aim was to find whether differently recorded 
material could be rendered at high quality via binaural Ambisonics renderers, or under 
which settings this would imply quality degradation when compared to channel-based 
binaural rendering. 

The results from the half of the listening experiment done in Graz is analyzed in 
Fig. 5.4, and the renderers compared were channel-based “ref”, a low-passed mono 
anchor designed to have poor quality “0”, a first-order binaural Ambisonic renderer 
“Ic” based on a cube layout with loudspeakers at 90°, +270° azimuth and +35.3° 
elevation, and MagLS binaural Ambisonic renderers at the orders “1”, “2”, “3”, “4”, 
and “5”. Obviously, for orders 2 and above, there is not much quality degradation 
compared to the reference channel-based binaural rendering. The spatial quality 
cannot be distinguished from the reference for MagLS with Ambisonic orders 3 and 
above, and the timbral qualities cannot be distinguished for Ambisonic orders 2 and 
above. 

While this result simplifies the practical requirements for headphone playback 
remarkably, it can be supposed that due to the limited sweet spot size, loudspeaker 
playback would still require higher orders, typically. 


5.2 Frequency-Independent Ambisonic Effects 


Many frequency- and time-independent Ambisonic effects are based on the afore- 
mentioned re-mapping of directions and manipulation of directional amplitudes, see 
e.g. Kronlachner’s thesis, [2, 17]; advanced effects can be found in [18]. In general, 
the surround-sound signal allows to be manipulated by any thinkable transformation 
that modifies the directional mapping and amplitude of its contents. The formulation 
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ZË, t) = 80) x(0.t) (5.3) 


expresses an operation that is able to pick out every direction 0 of the input signal, 
weight its signal by a directional gain g(@), and re-map it to a new direction 0 = t{6} 
within a transformed signal x. To find out how this affects Ambisonic signals, we write 
both x and x as Ambisonic signals x (0, t) = Yu (0) Xn (ft) andx(6, t) = In (9) xn (t) 
expanded in spherical/circular harmonics, 


YEO Xu (0) = 80) YNO) XN), 


and use Ss ORAC) dé = I by integrating over yg (dð So to get X q(t) on the 
left 
=r 
=——_————_- -_ 


O= [ IKË) 90) YEO dd xn) = T Xp (0) (5.4) 


to find the transformed signals being just re-mixed Ambisonic input signals by the 
matrix T (note that it might require an increased Ambisonic order N). Numerical 
evaluation of the matrix T = f sp YAO) 89) yg (0) dO is best done by using a high- 


enough t-design © = [0;] to discretize the integration variable 6= t{0}. For the 
discretized input directions 0, an inverse mapping @ = t~' {6} of the output direction 
must exist (directional re-mapping must be bijective), so that we can write 


, . ~~ An . 
r= | yx (0) g(t | {0}) yu(t'{0}) d0 = i Yio diag{g,-1;6)} YN o) 
Sb 


This formalism is generic and covers simplistic and more complex tasks. It helps 
understanding that every frequency-independent directional weighting and/or re- 
mapping is just re-mixing the Ambisonic signals by a matrix, as in Fig. 5.6a. 

The ambix VST plugin suite implements several effects, e.g. in the VST plug- 
ins ambix_mirror, ambix_rotate, ambix_directional_loudness, ambix_warp. 
The sections below explain how these and other effects work inside. 


5.2.1 Mirror 


Mirroring does not actually require the generic re-mapping and re-weighting for- 
malism from above, yet. The spherical harmonics associated with the Ambisonic 
channels are shown in Fig. 4.12 and upon closer inspection one recognizes their 
symmetries, see Fig.5.5. To mirror the Ambisonic sound scene with regard to planes 
of symmetry, it is sufficient to sign-invert channels associated with odd-symmetric 
spherical harmonics as in Fig. 5.6b. Formally, the transform matrix consists of a 
diagonal matrix T = diag{c} only, with the corresponding sign-change sequence c. 
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Fig. 5.5 Ambisonic singals 
associated with odd 
symmetric spherical 
harmonics are sign-inverted 
to mirror the sound scene. 
For every Cartesian axis, 
illustrations above show 
spherical harmonics up to the 
third-order, with the order 
index n organized in rows 
and the mode index m in 
columns. Even harmonics are 
blurred for visual distinction 
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(c) odd symmetric wrt. z (top axis) 


Up-down: For instance, spherical harmonics with |m| = n are even symmetric with 
regard to z = 0 (up-down), and from this index on, every second harmonic in m is. 
To flip up and down, it is therefore sufficient to invert the signs of odd-symmetric 
spherical harmonics with regard to z = 0; they are characterized by n + m being an 
odd number, or Cym = (11). 


Left-right: The sin -related spherical harmonics with m < 0 are odd-symmetric with 
regard to y = 0 (left-right); therefore sign-inverting the signals with the index m < 0 
exchanges left and and right in the Ambisonic surround signal, i.e. Cnm = (-1L"<°. 
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Fig. 5.6 Block diagrams of frequency-independent transformations such as re-mapping and re- 
weighting (left, matrix operations), or mirroring (right, sign-only operations) 


Front-back: Every odd-numbered m > 0 is odd-symmetric with regard to x = 0 
(front-back), and so is every even-numbered harmonic with m < 0. Inverting the 
sign of these harmonics, Cam = (—1)”"+"~<, flips front and back in the Ambisonic 
surround signal. 


5.2.2 3D Rotation 


Rotation can be expressed by a general rotation matrix R consisting of a rotation 
around z by x, around y by #, and again around z by ø, see Fig. 5.7. This rotation 
matrix maps every direction @ to a rotated direction 6: 


6 = R(ọ, ò, x) 0, (5.5) 
cos(g) — sin(g) 0 | | cos(®) 0 — sin?) cos(x) — sin(x) 0 
R = | sin(y) cos(g) 0 0 1 0 sin(x) cos(x) 0 
0 (0) 1 sin(®) 0 cos(®) 0 (0) 1 


Using this as a transform rule 10) = R0 with neutral gain g(0) = 1, we find the 
transform matrix by the inverse mapping 0 = R'@ as 


Fig. 5.7 zyz-Rotation on the plain example of great-circle navigation of a paper plane around the 
earth. With the original location at the zenith, a first rotation around z determines the course, and 
the subsequent rotations around y and z relocate the plane in zenith and azimuth 
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4r T 


T = = Yno Yy r'o: (5.6) 


Using the L directions of a t > 2N -design © is sufficient to sample the harmonics 
accurately. With the resulting T, rotation is implemented as in Fig. 5.6a. 

There is plenty of potential for simplification: As only the spherical harmonics of 
a given order n are required to re-express a rotated spherical harmonic of the same 
order n, T is actually block diagonal T = blk diagn{T„}, and within each spherical 
harmonic order, the integral could be more efficiently evaluated using a smaller 
t > 2n-design. Moreover, there are various fast and recursive ways to calculate the 
entries of T as in [19-25] and implemented in most plugins. And yet, in practice a 
naïve implementation can be fast enough and pragmatic. 


Rotation around z. One special case of rotation is important and particularly simple 
to implement. A directional encoding in azimuth always either is equal to ®, (ps) 
in 2D, or contains it in 3D. For m > 0, the azimuth encoding ®,,(¢,) depends on 
cos mos, and its negative-sign version ®_,, (ps) depends on sin(|m|@,). The encoding 
angle can be offset by the trigonometric addition theorems. They can be written as a 
matrix: 


ha + J — | cos mo a bad (5.7) 


cos m(@, + p) —sinm@ cos mọ | | cosm@, 
1 $$ 


R(mg) 


By this, any Ambisonic signal, be it 2D or 3D, can be rotated around z by the matrices 
R(m@) for the signal pairs with +m. 


e ad = Rimo) | 


Om (Ps +p) Pm (Ps) 
Y,” (Ps +, Bs) Y,” (Ps, Ds) 
n = R n : 5.8 
| yy" (Qs =e Q, Ds) | (mp) | Ke (Ps9) | ( ) 


Figure 5.8a shows the processing scheme implementing only the non-zero entries 
of the associated matrix operation T. Combined with a fixed set of 90° rotations 
around y (read from files), it can be used to access all rotational degrees of freedom 
in 3D [20]. 

The rotation effect is one of the most important features when using head-tracked 
interactive VR playback for headphones. Here, rotation counteracting the head move- 
ment has the task to support the impression of a static image of the virtual outside 
world. 


5.2.3 Directional Level Modification/Windowing 


What might be most important when mixing is the option to treat the gains of different 
directions differently: it might be necessary to attenuate directions of uninteresting 
or disturbing content while boosting directions of a soft target signal. For such a 
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Fig. 5.8 Rotation around z and Ambisonic widening/diffuseness apply simple 2 x 2 rotation matri- 
ces/filter matrices to each Ambisonic signal pair Xn,m Xn,—m Of the same order n. Note that the order 
of the input/output channels plotted is not the typical ACN sequence to avoid crossing connections 
and hereby simplify the diagram 


manipulation there is a neutral directional re-mapping 6 = 0 and the transform to 
define the matrix T that is implemented as in Fig. 5.6a remains 


4r : 
T= F7 Yx.0 diag{ge} YN.o- (5.9) 


In the simplest version, as implemented in ambix_directional_loudness, the 
gain function just consists of two mutually exclusive regions, e.g. within a region 
of diameter œ around the direction @,, and a complementary region outside, with 
separately controlled gains gin and Zout: 


(0) = ginu(O'O, — cos £) + Zou u(cos Z — 070g), (5.10) 


where u(x) represents the unit-step function that is 1 for x > 0 and 0 else. Note that 
the Ambisonic order of this effect will need to be larger to be lossless. However, 
with reasonably chosen sizes a and gain ratios gin/ Zour, the effect will nevertheless 
produce reasonable results. Figure 5.9 shows a window at azimuth and elevation at 
22.5° with an aperture of 50° using gi, = 1 and Zout = 0 and the order of N = 10 
with a grid of encoded directions to illustrate the influence of the transformation. 

For reference: entries of the tensor used to analytically re-expand the product 
of two spherical functions x(0) g(@) given by their spherical harmonic coefficients 
Xnms Yam ave called Gaunt coefficients or Clebsh-Gordan coefficients [6, 26]. 
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(a) original (b) windowed 


Fig. 5.9 Directionally windowed Ambisonic test image at every 90° in azimuth, interleaved in 
azimuth for the elevations +60° and +22.5°, using the order N = N = 10, a window size of 


5 = 50° around azimuth and elevation of 0° and max-rg weighting 


5.2.4 Warping 


Gerzon [27, Eq. 4a] described the effect dominance that is meant to warp the 
Ambisonic surround scene to modify how vitally the essential parts in front of the 
scene are presented. 

Warping wrt. a direction. For mathematical simplicity, we describe this bilinear 
warping with regard to the z direction. To warp with regard to the frontal direction, one 
first rotates the front upwards, applies the warping operation there, and then rotates 
back. The bilinear warping modifies the normalized z coordinate ¢ = cos ¥ = 0, so 
that signals from the horizon ¢ = 0 are pulled to t=a, 


æ +ý 
l+act 


t= , (5.11) 


while keeping for the poles ë = +1 what was originally there ¢ = +1. Hereby, the 
surround signal gets squeezed towards or stretched away from the zenith, or when 
rotating before and after: towards/from any direction. 

The integral can be discretized and solved by a suitable t-design as before, only 
that for lossless operation, the output order N must be higher than the input order N. 
We get a matrix T that is implemented as in Fig. 5.6a is computed by 


4r l ~T 
T = Tse diag{g,-1;e;} YN- (5.12) 


The inverse mapping yields 


¿=r (Ë) = E. (5.13) 
—(¥ 


and it modifies the coordinates of the t-design inserted for 6 1 = 8 = [6x7, Oyz, 8,1)" 
with ¢, = 0,,, accordingly 
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(a) original (b) warped 


Fig. 5.10 Warping of the horizontal plane by 22.5° downwards; original Ambisonic test image 
contains points at every 90° in azimuth, interleaved in azimuth for the elevations +60° and +22.5°; 


orders are N = N = 10; max-rg weighted 
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The gain g (2) of the generic transformation is useful to preserve the loudness of what 
becomes wider and therefore louder in terms of the E measure after re-mapping. To 
preserve loudness, the resulting surround signal is divided by the square root of the 


stretch applied, which is related to the slope of the mapping by z = e Expressed 


as de-emphasis gain, we get 


1—aĉ  1— obz 
VI- <iaa 


Figure 5.10 shows warping of the horizontal plane by 20° downwards, using the test 
image parameters as with windowing; de-emphasis attenuates widened areas. 

In the same fashion, Kronlachner [17] describes another warping curve that warps 
with regard to fixed horizontal plane and pole, either squeezing or stretching the 
content towards or away from the horizon, symmetrically for both the upper and 
lower hemispheres (second option of the ambix_warp plugin). 


gê) = (5.15) 


5.3 Parametric Equalization 


There are two ways of employing parametric equalizers to Ambisonic channels. 
Either a single-/multi-channel input of a mono-encoder or a multiple-input encoder 
is filtered by parametric equalizers. Or each of the Ambisonic signal’s channels is 
filtered by the same parametric equalizer, see Fig. 5.1 1a. 
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(a) Filtering/EQ of Ambisonic signals (b) Dynamic processing of Ambisonic signals 


Fig. 5.11 Block diagram of processing that commonly and equally affects all Ambisonic signals, 
such as parametric equalization and dynamic processing (compression), without recombining the 
signals 


Bass management is often important to not overdrive smaller loudspeaker systems 
of, e.g., a 5th-order hemispherical playback system with subwoofer signals: All 
36 channels from the Ambisonic bus can be sent to a decoder section, in which 
frequencies below 70—100 Hz are high-cut by a 4th-order filter before running through 
the Ambisonic decoder, while the first channel from the Ambisonics bus alone, the 
omnidirectional channel, is being sent to a subwoofer section, in which a 4th-order 
filter high-cut removes the high frequencies above 70—100 Hz before the signal is sent 
to the subwoofers. If the playback system is time-aligned between subwoofer and 
higher frequencies, the 4th-order crossovers should be Linkwitz—Riley filters (either 
squared Butterworth high-pass or low-pass filters) to preserve phase equality [28]. 

For more information on parametric equalizers, the reader is referred to Udo 
Zölzer’s book on Digital audio effects [29]. 


5.4 Dynamic Processing/Compression 


Individual compression of different Ambisonic channels would destroy the direc- 
tional consistency of the Ambisonics signal. Consequently, dynamic processing 
should rather affect the levels of all Ambisonic channels in the same way. As it 
typically contains all the audio signals, it is useful to have the first, omnidirectional 
Ambisonic channel control the dynamic processor as side-chain input, see Fig. 5.1 1b. 
For more information on dynamic processing, the reader is referred to Udo Zélzer’s 
book on Digital audio effects [29]. 

Moreover, it is sometimes useful to compress the vocals of a singer separately. 
To this end, the directional compression would first extract a part of the Ambisonic 
signals by a directional window, creating one set of Ambisonic signals without the 
directional region of the window, and another one exclusively containing it. The 
compression is applied on the resulting window signal before re-combining it with 
the rest signals. 
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5.5 Widening (Distance/Diffuseness/Early Lateral 
Reflections) 


Basic widening and diffuseness effects can be regarded as being inspired by Gerzon 
[30] and Laitinen [31] who proposed to apply frequency-dependent panning filters, 
mapping different frequencies to directions dispersed around the panning direction. 
The resulting effect is fundamentally different from and superior to increasing the 
spread in frequency-independent MDAP with enlarged spread or Ambisonics with 
reduced order, which could yield audible comb filtering. 

To apply this technique to Ambisonics, Zotter et al. [32] proposed to employ a 
dispersive, i.e. frequency-dependent, rotation of the Ambisonic scene around the z- 
axis as in Eq. (5.8) by the matrix R as described above and in Fig. 5.8b, using 2 x 2 
matrices of filters to implement the frequency-dependent argument mo COS WT 


(5.16) 


Rind sour | cos(mọ COS WT) sin(md cos “| l 


— sin(md COS WT) cos(md COS WT) 


whose parameters b and t allow to control the magnitude and change rate of the 
rotation with increasing frequency. How this filter matrix is implemented efficiently 
was described in [33], where a sinusoidally frequency-varying pair of functions 


g1(@) = cos[acos(wtT)], g2(w) = sin [æ cos(wT)], (5.17) 


was found to correspond to the sparse impulse responses in the time domain 


gi(t) = » Jq (œ) cos(F |q|) ôt — qT) (5.18) 


q=—00 


A= J Jala) sin(S Iq) ôt — q 1), 


q=—00 


allowing for truncation to just a few terms in q, typically 11 taps between —5 <q <5 
or fewer, and hereby an efficient implementation. For the implementation of the filter 
matrix, for each degree m, the value a = mo. (It might be helpful to be reminded of 
a phase-modulated cosine and sine from radio communication, whose spectra are 
the same functions as this impulse response pair.) 

As the algorithm places successive frequencies at slightly displaced directions, 
the auditory source width increases. Moreover, the frequency-dependent part causes 
a smearing of the temporal fine-structure in the signal. In [34], it was found that 
implementations discarding the negative values of q, i.e. keeping g > 0 sound more 
natural and still exhibit a sufficiently strong effect. Time constants t around 1.5 ms 
yield a widening effect, and a diffuseness and distance impression is obtained with t 
around 15 ms. The parameter ĝ is adjustable between 0 (no effect) and larger values. 
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Beyond 80° the audio quality starts to degrade. The use as diffusing effect has turned 
out to be useful as simple simulation of early lateral reflections, because most parts 
of the spectrum are played back near the reversal points +¢ of the dispersion con- 
tour. For naturally sounding early reflections, additional shelving filters introducing 
attenuation of high frequencies prove useful. 

Figures 5.12 and 5.13 show experimental ratings of the perceived effect strength 
(width or distance) of the above algorithm in [34], which was implemented as 
frequency-dependent (dispersive) panning on just a few loudspeakers L = 3, 4, 5, 7 
evenly arranged from —90° to 90° on the horizon at 2.5 m distance from the cen- 
tral listening position. Loudspeakers were controlled by a sampling decoder of the 
orders N = 1, 2, 3,5 with the center of the max-rg-weighted panning direction at 
0° in front. The signal was speech and as a reference it used the frontal loudspeaker 
with the unprocessed signal “REF”. The experiment tested the algorithm with both 
the symmetric impulse responses suggested by Eq. (5.18), and such truncated to their 
causal q > O-side, for a listening position at the center of the arrangement (bullet 
marker) and at 1.25 m shifted to the right, off-center (square marker). Figure 5.12 
indicates for the widening algorithm with t = 1.5 ms that the perceived width satu- 
rates above N > 2 at both listening positions. Despite the effect of the causal-sided 
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Fig. 5.12 Perceived width (left) and audio quality (right) of frequency-dependent dispersive 
Ambisonic rotation as widening effect using the setting t = 1.5 ms, the Ambisonic orders 
N = 1, 2,3, 5, and L = 3,4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions 
at the center (bullet marker) and half-way right off-center (square marker) 
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Fig. 5.13 Perceived width (left) and audio quality (right) of frequency-dependent dispersive 
Ambisonic rotation as distance/diffuseness effect using the setting t = 15 ms, the Ambisonic orders 
N = 1, 2,3, 5, and L = 3,4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions 
at the center (bullet marker) and half-way right off-center (square marker) 
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implementation is weaker in effect strength, it highly outperforms the symmetric 
FIR implementation in terms audio quality (right diagram), while still producing a 
clearly noticeable effect when compared to the unprocessed reference (left diagram). 

A more pronounced preference of the causal-sided implementation in terms of 
audio quality is found in Fig. 5.13 for the setting t = 15 ms, where the algorithm 
is increasing the diffuseness or perceived distance for orders N > 2 at both 
listening positions. 


5.6 Feedback Delay Networks for Diffuse Reverberation 


Feedback delay networks (FDN, cf. [35, 36]) can directly be employed to create dif- 
fuse Ambisonic reverberation. A dense response and an individual reverberation for 
every encoded source can be expected when feeding the Ambisonic signals directly 
into the inputs of the FDN. 

As in Fig. 5.14, FDNs consists of a matrix A that is orthogonal ATA = I and 
should mix the signals of the feedback loop well enough to distribute them across 
all different channels to couple the resonators associated with the different delays 
ti. These delays should not have common divisors to avoid pronounced resonance 
frequencies, and are therefore typically chosen to be related to prime numbers. Small 
delays are typically selected to be more closely spaced {2, 3, 5, ...} ms to simulate 
a diffuse part with densely spaced response at the beginning, and long delays further 
apart often make the reverberation more interesting. Using unity factors as channel 
gains gi, gii, 8, = l and any orthogonal matrix A, the reverberation time becomes 
infinite. For smaller channel gains, the FDN produces decaying output. 


Reverberation is characterized by the exponentially decaying envelope 107370, 


For a single delay of the length t;, the corresponding gain is g” with g = 10” 70. 
This factor with the corresponding exponent provides equal reverberation decay rate 
in every channel, and hereby exact control of the reverberation time. To make the 
effect sound natural, it is typical to adjust the gains within a high-mid-low filter set 
to decrease the reverberation towards higher frequency bands by the gains gẹ < 
Bi $ gio 

The vector gathering the current sample for every feedback path is multiplied 
by the matrix A. For calculation in real-time, Rocchesso proposed to use a scaled 
Hadamard matrix A = aH of the dimensions M = 2* in [37]. It consists of +1 
entries only and hereby perfectly mixes the signal across the different feedbacks to 
create a diffuse set of resonances. What is more, this not only replaces the M x M 
multiply and adds of matrix multiplication multiplies by sums and differences, it is 
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Fig. 5.14 Feedback delay network (FDN) for Ambisonic reverb. The matrix A is unitary and the 
3 


gain g = 10 7% to the power of the delay 1; allows to adjust a spaitally and temporally diffuse 
reverberation effect in different bands (lo, mi, hi) 


moreover equivalent to the efficient implementation as Fast Walsh-Hadamard Trans- 
form (FWHT), a butterfly algorithm. Figure 5.15 shows a graphical implementation 
example of a 16-channel FWHT in the real-time signal processing environment Pure 
Data. 
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Fig. 5.15 The fast Walsh-Hadamard transform (FWHT) variant implemented in the 16-channel 
feedback delay network reverberator [rev3~] in Pure Data requires only 4 x 16 sums/differences 
to replace the 16 x 16 multiplies of matrix multiplication by A 
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5.7 Reverberation by Measured Room Impulse Responses 
and Spatial Decomposition Method in Ambisonics 


The first-order spatial impulse response of a room at the listener can be improved by 
resolution enhancements of the spatial decomposition method (SDM) by Tervo [38], 
which is a broad-band version of spatial impulse response rendering (SIRR) by 
Merimaa and Pulkki [39, 40]. For reliable measurements, typically loudspeakers 
are employed, and the typical measurement signals aren’t impulses, but swept-sine 
signals that can are reverted to impulses by deconvolution. A room impulse response 
is typically sparse in its beginning whenever direct sound and early reflections arrive 
at the measurement location. Generally, it is likely that those arrival times in the early 
part do not coincide and are well separated from each other, so that one can assume 
their temporal disjointness at the receiver. 

From a room impulse response h(t) that complies with this assumption, for which 
there consequently is a direction of arrival (DOA) poa (t) for every time instant, one 
could construct an Ambisonic receiver-directional room impulse response as in [41] 
h(Or, t) = h(t) ô[1 — OR ODOA (t)], depending on the direction Ôr at the receiver. 
This response can be transformed into the spherical harmonic domain by integrating 
it over yy (Ox) dôr des , to get the set of N""-order Ambisonic room impulse responses 


hy(t) = A(t) ynlôpoa(t)]. 


A signal s(t) convolved by this vector of impulse responses theoretically generates 
a 3D Ambisonic image of the mono sound in the room of the measurement. This 
can be done, e.g., by the plug-in mc£x_convolver. Now there are two problems to 
be solved: (i) how to estimate @poa (t), (ii) how to deal with the diffuse part of h(t), 
when there are more sound arrivals at a time than one. 


Estimation of the DOA. One could now just detect the temporal peaks of the room 
impulse response and assign the guessed evolution of the direction of arrival as 
suggested in [42], and hereby span the envelopment of the room impulse response. 
Alternatively, if the room impulse response was recorded by a microphone array as 
in [38], array processing can be used to estimate the direction of arrival @poa(t). 
For first-order Ambisonic microphone arrays, when suitably band-limited to the 
frequency range in which the directional mapping is correct, e.g. between 200 Hz 
and 4 kHz, the vector rpoa of Eq. (A.83) in Appendix A.6.2 yields a suitable estimate 


X(t) 


Fpooa(t) = W(t) | YŒ) | =—pch@)v@, Opoa) = CA 
Z(t) Ido (4) | 


(5.19) 
Figure 5.16 shows the directional analysis of the first 100 ms of a first-order direc- 
tional impulse response taken from the openair lib! This response was measured 
in St. Andrew’s Church Lyddington, UK (2600 m°? volume, 11.5 m source-receiver 
distance) with a Soundfield SPS422B microphone. 


“http://www.openairlib.net. 
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Fig. 5.16 Spatial 
distribution of the first 

100 ms of a first-order 
directional impulse response 
measured at St. Andrew’s 
Church Lyddington. 
Brightness and size of the 
circles indicate the level 


Fig. 5.17 Frequency- 
dependent reverberation time 
calculated from original and 
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The direct sound from the front is clearly visible, as well as strong early reflections 
from front and back, and equally distributed weak directions from the diffuse reverb. 


Spectral decay recovery for higher-order RIRs. The second task mentioned above 
is that the multiplication of h(t) by yy[@poa (t)] to obtain h(t) degrades the spectral 
decay at higher orders. If there is no further processing, the resulting response typi- 
cally exhibits a noticeable increased spectral brightness [38, 41, 43]. This unnatural 
brightness mainly affects the diffuse reverberation tail, where temporal disjointness 
is a poor assumption. There, the corresponding rapid changes of @poa(t) cause a 
strong amplitude modulation in the pre-processing of the late room impulse response 
at high Ambisonic orders. Typically, long decays of low frequencies leak into high 
frequencies, and hereby result in an erroneous spectral brightening of the diffuse tail. 
Figure 5.17 analyses the behavior in terms of an erroneous increase of reverberation 
time at high frequencies, especially when using high orders. 

In order to equalize the spectral decay and hereby the reverberation time of the 
SDM-enhanced impulse response, there is a helpful pseudo-allpass property of the 
spherical harmonics for direct and diffuse fields, as described in Eqs. (A.52) and 
(A.55) of Appendix A.3.7. The signals in the vector h(t) = [h™ (t) nm are first decom- 
posed into frequency bands, yielding the sub-band responses hm (t, b). We can equal- 
ize the spectral sub-band decay for every band b and order n by targeting fulfillment 
of the pseudo-allpass property 
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E{[h™(t, DJ?) = Qn + DE{|AG(e, By}. (5.20) 


m=—-n 


The formulation above relies on the correct spectral decay of the omnidirectional 
signal Alt, b) = h8 (t, b), which is unaffected by modulation. Correction is achieved 
by 


(2n + 1) E{|hg(t, b)? 
Y n Elam (t, b) 


h” (t, b) = h” (t, b) (5.21) 


here, the expression £{]| - |7} refers to estimation of the squared signal envelope. 


Perceptual evaluation. Frank’s 2016 experiments [44] measuring the area of the 
sweet spot also investigated the plausibility of reverberation created by their Ambison- 
ically SDM-processed measurements at different order settings, N = 1, 3,5. For 
Fig. 5.18b listeners indicated at which distance from the room’s center they heard 
that envelopment began to collapse to the nearest loudspeakers. One can observe that 
rendering diffuse reverberation for a large audience benefits from a high Ambisonic 
order. Moreover, experiments in [43] revealed an improvement of the perceived spa- 
tial depth mapping, i.e. a clearer separation between foreground and background 
sound for the SDM-processed higher-order reverberation, cf. Fig. 1.2 1b. 
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Fig. 5.18 The perceptual sweet spot size as investigated by Frank [44] for SDM processed RIRs 
cover an area in IEM CUBE that increases with the SDM order N chosen (black = Sth, gray = 3rd, 
light gray = 1st order Ambisonics). In comparison to panned direct sound, one should keep some 
distance to the loudspeakers to avoid breakdown of envelopment 
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5.8 Resolution Enhancement: DirAC, HARPEX, 
COMPASS 


The concept of parametric audio processing [5] describes ways to obtain resolution- 
enhanced first-order Ambisonic recordings by parametric decomposition and render- 
ing. One main idea is to decompose short-term stationary signals of a sound scene 
into a directional and a less directional diffuse stream. 

For synthesis of the directional part based on mono signals, it is clear how to obtain 
the most narrow presentations by amplitude panning or higher-order Ambisonic 
panning of consistent rg vector predictions as in Chap. 2. 

The synthesis of diffuse and enveloping parts based on a mono signal can require 
extra processing such as either widening/diffuseness effects or reverberation as in 
Sects. 5.5 and 5.6, which both also provide a directionally wide distribution of sound. 
Or more practically, the recording itself could deliver sufficiently many uncorrelated 
instances of the diffuse sound to be played back by surrounding virtual sources. 
Envelopment and diffuseness is based on providing a consistently low interaural 
covariance or cross correlation of sufficiently high decorrelation. 


DirAC. A main goal of DirAC (Directional Audio Coding [5]) is finding signals 
and parameters for sound rendering by analyzing first-order Ambisonic recordings. 
One variant is to use the intensity-vector-based analysis in the short-term Fourier 
transform (STFT), see also Appendix A.6.2: 


pc Re{ p(t, w)*v(t, w)} E Ref W(t, w)*[X(t, w), Y (t, œw), Z(t, w)|"} 
Ip(t, @)|? 7 V2|W(t, œ)? 


DOA (t, w) = 


’ 


(5.22) 


which can be treated similarly as the rg vector, regarding direction and diffuseness 
Y = 1 -Irpoall’. 


Single-channel DirAC is Ville Pulkki’s original way to decompose the W (t, œw) signal 
in the STFT domain into a directional signal VI — Y W(t, œ) that is synthesized by 
amplitude panning and a diffuse signal VY W(t, œ) to be synthesized diffusely [45]. 
Virtual-microphone DirAC uses a first-order Ambisonic decoder to the given 
loudspeaker layout and time-frequency-adaptive sharpening masks increasing the 
focus of direct sounds, see Vilkamo [46] and [5, Ch. 6], or order e.g. Sect. 5.2.3. 
Playback of diffuse sounds benefits from an optional diffuseness effect. 


HARPEX (high angular-resolution plane-wave expansion [47]) is Svein Berge’s 
patented solution to optimally decode sub-band signals. It is based on the observation 
he made with Natasha Barrett that decoding to a tetrahedral loudspeaker layout is 
perceptually outperforming if the tetrahedron nodes are rotationally aligned with 
the sources of the recording. HARPEX accomplishes convincing diffuse and direct 
sound reproduction by decoding to a variably adapted virtual loudspeaker layout in 
every sub band. The layout is adaptively rotation-aligned with sources detected in 
the band. HARPEX is typically described using an estimator for direction pairs. 
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COMPASS (COding and Multidirectional Parameterization of Ambisonic Sound 
Scenes [48]) by Archontis Politis can be seen as an extension of DirAC. In contrast 
to DirAC, it tries to detect and separate multiple direct sound sources from the ambient 
or background sound. This is done by applying two different kinds of beamformers: 
one that contains only the direct sound for each sound source (source signals) and 
one that contains everything but the direct sound (ambient signal). Similar as before, 
the source signals are reproduced using amplitude panning and the ambient signal is 
sent to the decorrelator. In contrast to DirAC, COMPASS is not limited to first-order 
input but can also enhance the spatial resolution of higher-order inputs. 


5.9 Practical Free-Software Examples 


5.9.1 IEM, ambix, and mcfx Plug-In Suites 


The ambix_converter is an important tool when adapting between the different 
Ambisonic scaling conventions, e.g. the standard SN3D normalization that uses only 


(a=|m))! 2—8» for normalization instead of the full J (=|) 2-8) 2n+) that is called 
(n+|m|)! 4r (n+|m|)! An 


N3D, see Fig. 5.19. This alternating definition is because of a practical choice of the 
ambix format [49] to avoid high-order channels becoming louder than the zeroth- 
order channel. Also it permits to adapt between channel sequences such as ACN’s 
i=n*>+n+morSID’si =n? +2(n— |m|) + (m < 0). It is advisable to use test 
recordings with the main directions, e.g. front, left, top, and to check that the channel 
separation for decoded material is roughly exceeding 20 dB for 5th-order material. 
Moreover, it contains inversion of the Condon-Shortley phase that typically causes 
a 180° rotation around the z axis, and it contains the left-right, front-back, and top- 
bottom flips discussed in the mirroring operations above. 


Fig.5.19 ambix_converter 
plug-in AMBIX-CONVERTER 


convert between Ambisonics formats 


Presets 


Normalization ESES] SN3D 
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Mirror ( left <> right 
@ front <> back 


@ Invert Condon-Shortley @ top <> bottom 
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The ambix_warping plugin, see Fig. 5.20, implements the above-mentioned 
warping operations shifting horizontal sounds towards one of the poles, or into both 
polar directions. Warping can be applied to any other direction than zenith and 
nadir when placing it between two mutually inverting ambix_rotation or IEM 
SceneRotator objects that intermediately rotate zenith to another direction. 

The IEM SceneRotator as the ambix_rotation plugin can be controlled by 
head tracking and it essential for an immersive headphone-based experience, see 
Fig. 5.21. Its processing is done as described above. 

The ambix_directional_loudness plugin in Fig. 5.22 implements the above- 
mentioned directional amplitude window in either circular or equi-rectangular spher- 
ical shape. Several of these windows can be made, soloed, and remote controlled, 
each one of which allowing to set a gain for the inside and outside region. This is 
often useful in practice, when, e.g., reinforcing or attenuating desired or undesired 
signal parts within an Ambisonic scene. 


ambix_warp_o5 

Az Warp factor | 0 

Az Wamp curve | -90 / 90 deg 

E Wamp factor | 0 

5 Wamp curve | northpole 
Ambi In order | 5 

Ambi Out order | BE 
PreEmphasis | |} On 


Fig. 5.20 ambix_warping plug-in in Reaper 
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Fig. 5.21 IEM SceneRotator and ambix_rotator plug-ins 


122 5 Signal Flow and Effects in Ambisonic Productions 


x_directional_loudness 


filter 1 gain Q | 6.0 | 


azimuth 


© oo | rectangular $ 
th © 


elevation Wict 


© height © 


60da 
Peak level 


Fig. 5.23 EnergyVisualizer plug-in 


To observe the changes made to the Ambisonic scene, the IEM Energy 
Visualizer can be helpful, see Fig. 5.23. 

If, for instance, the Ambisonic scene requires dynamic compression, as outlined 
in the section above, the IEM omnicompressor is a helpful tool. It uses the omnidi- 
rectional Ambisonic channel to derive the compression gains (as a side-chain for all 
other Ambisonic channels). Similarly as the directional_loudness plug-in, the 
IEM DirectionalCompressor allows to select a window, but this time for setting 
different dynamic compression within and outside the selected window, see Fig. 5.24. 

The multichannel mc£x_filter plugin in Fig. 5.25 does not only implement a set 
of parametric equalizers, a low- and high cut that can be toggled between filter skirts 
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Fig. 5.25 mcfx_filter plug-in 


of either 2nd or 4th order, but it also features a real-time spectrum analyzer to observe 
the changes done to the signal. It is not only practical for Ambisonic purposes, it’s 
just a set of parametric filters that is equally applied to all channels and controlled 
from one interface. 

The mcfx_convolver plug-in in Fig. 5.26 is useful for many purposes, also 
scientific ones, e.g., when testing binaural filters or driving multi-channel arrays with 
filters, etc. Its configuration files use the jconvolver format that specifies which 
filter file (typically stored in multi-channel wav files) connects which of its multiple 
inlets to which of its multiple outlets. It is also used to implement the SDM-based 
reverberation described in the above sections. 

For a cheaper reverberation network, the IEM FDNReverb network described 
above can be used, see Fig. 5.27. It is not in particular an Ambisonic tool, but can 
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MCFX-CONVOLVER 
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Fig. 5.26 mcf£x_convolver plug-in 
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Fig. 5.27 FDNReverb plug-in 


be used in any multi-channel environment. The particularity of the implementation 
in the IEM suite is that a slow onset can be adjusted. 

The ambix_widening plug-in in Fig. 5.28 implements the widening by frequency- 
dependent, dispersive rotation of the Ambisonic scene around the z axis as described 
above. It can also be used to cheaply stylize lateral reflections instead of the IEM 
RoomEncoder (Fig. 4.36) with time constant settings exceeding 5 ms, or just as a 
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Fig. 5.28 ambix_widening plug-in in Reaper 
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Fig. 5.29 mcfx_gain_delay plug-in 


widening effect. The setting single-sided permits to suppress the slow attack of the 
Bessel sequence. 

Another tool is quite helpful, the mefx_gain_delay plug-in in Fig. 5.29. It permits 
to to solo or mute individual channels, as well as delay and attenuate them differently. 
What is more and often even more useful: It is invaluably helpful for testing the signal 
chain, as one can step through the channels with different signals. 


5.9.2 Aalto SPARTA 


The SPARTA plug-in suite by Aalto University provides Ambisonic tools for encod- 
ing, decoding on loudspeakers and headphones, as well as visualization. A special 
feature is the COMPASS decoder plug-in Fig. 5.30 that can increase the spatial res- 
olution of first-, second-, and third-order recordings. Playback can be done either 
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Fig. 5.30 COMPASS Decoder plug-in 


on arbitrary loudspeaker arrangements or their virtualization on headphones. The 
signal-dependent parametric processing allows to adjust the balance between direct 
and diffuse sound in each frequency band. In order to suppress artifacts due to the 
processing, the parametric playback (Par) can be mixed with the static decoding 
(Lin) of the original recording. While it is advisable to keep the parametric contribu- 
tion below 2/3 for noticable directional improvements and low artifacts, in general, 
in recordings with cymbals or hihats it is advisable to fade towards lin starting at 
around 4 kHz. 


5.9.3 Rode 


The Soundfield plug-in by Røde in Fig. 5.31 was originally designed to process the 
signals from the four cardioid microphone capsules of their Soundfield microphone. 
However, it also supports first-order Ambisonics as input format. It can decode to 
various loudspeaker arrangements by placing virtual microphones into the directions 
of the loudspeakers. The directivity of each virtual microphone can be adjusted 
between first-order cardioid and hyper-cardioid. Moreover, higher-order directivity 
patterns are possible using a parametric signal-dependent processing, resulting in an 
increase of the spatial resolution. 
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Fig. 5.31 Soundfield by Røde plug-in 
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Chapter 6 A) 
Higher-Order Ambisonic Microphones ciei; 
and the Wave Equation (Linear, Lossless) 


...a turning point has been the design of HOA microphones, 
opening an exciting experimental field in terms of real 3D sound 
field recording ... 


Jérôme Daniel [1] at Ambisonics Symposium 2009. 


Abstract Unlike pressure-gradient transducers, single-transducer microphones with 
higher-order directivity apparently turned out to be difficult to manufacture at rea- 
sonable audio quality. Therefore nowadays, higher-order Ambisonic recording with 
compact devices is based on compact spherical arrays of pressure transducers. To 
prepare for higher-order Ambisonic recording based on arrays, we first need a model 
of the sound pressure that the individual transducers of such an array would receive 
in an arbitrary surrounding sound field. The lossless, linear wave equation is the 
most suitable model to describe how sound propagates when the sound field is com- 
posed of surrounding sound sources. Fundamentally, the wave equation models sound 
propagation by how small packages of air react (i) when being expanded or com- 
pressed by a change of the internal pressure, and to (ii) directional differences in the 
outside pressure by starting to move. Based there upon, the inhomogeneous solu- 
tions of the wave equation describe how an entire free sound field builds up if being 
excited by an omnidirectional sound source, as a simplified model of an arbitrary 
physical source, such as a loudspeaker, human talker, or musical instrument. After 
adressing these basics, the chapter shows a way to get Ambisonic signals of high 
spatial and timbral quality from the array signals, considering the necessary diffuse- 
field equalization, side-lobe suppression, and trade off between spatial resolution 
and low-frequeny noise boost. The chapter concludes with application examples. 


Gary Elko and Jens Meyer are the well-known inventors of the first commercially 
available compact spherical microphone array that is able to record higher-order 
Ambisonics [2], the Eigenmike. There are several inspiring scientific works with 
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valuable contributions that can be recommended for further reading [3—12], above 
all Boaz Rafaely’s excellent introductory book [13]. 

This mathematical theory might appear extensive, but it cannot be avoided when 
aiming at an in-depth understanding of higher-order Ambisonic microphones. The 
theory enables processing of the microphone signals received such that the surround- 
ing sound field excitation is retrieved in terms of an Ambisonic signal. Some readers 
may want to skip the physical introduction and resume in Sect.6.5 on spherical 
scattering or Sect. 6.6 on the processing of the array signals. 


6.1 Equation of Compression 


Wave propagation involves reversible short-term temperature fluctuations becoming 
effective when air is being compressed by sound, causing the specific stiffness of 
air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic 
compression relation based on the first law of thermodynamics and the ideal gas 
law. It relates the relative volume change a to the pressure change p = —K va 
by the bulk modulus of air. After expressing the bulk modulus by more common 
constants! K = pc? and differentially formulating the volume change over time 
using the change of the sound particle velocity in space, e.g. in one dimension p = 


—pc? ove , cf. Appendix A.6.1, we get the three-dimensional compression equation: 
dp 2 yT 
— = Viv. 6.1 
at peo Voy (6.1) 


a 


Here the inner product of the Del symbol VT = (A, ay? 


2) with v yields what is 
called divergence div(v) = Vy = aa + = z4. The equation means: Indepen- 
dently of whether the outer boundaries of a small package of air are traveling at a 
common velocity: If there are directions into which their velocity is spatially increas- 
ing, the resulting gradual volume expansion over time causes a proportional decrease 


of interior pressure over time. 


6.2 Equation of Motion 


The equation of motion is relatively simple to understand from the Newtonian equa- 


tion of motion, e.g. for the x direction, Fy = m ae equates the external force to 


mass m times acceleration, i.e. increase in velocity z, For a small package of 
air with constant volume Vp = AxAyAz, the mass is obtained by the air den- 
sity m = p Vo, and the force equals the decrease of in pressure over the three 
space directions, times the corresponding partial surface, e.g. for the x direction 


! Typical constants are: density p = 1.2 kg/m?, speed of sound c = 343 m/s. 
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F; = —[p(« + Ax) — p(x)]AyAz. For the x direction, this yields after expanding 
by = 
Ap OVx 


V=pV : 
Ax ° ey’ 


Dividing by — Vo and letting Vo — 0, we obtain the typical shape of the equation of 
motion for all three space directions 


Vp=-p—. (6.2) 


The equation of motion means: Independently of the common exterior pressure load 
on all the outer boundaries of a small air package, an outer pressure decrease 
into any direction implies a corresponding pushing force on the package causing a 
proportional acceleration into this direction. 


6.3 Wave Equation 


We can combine the compression equation ® — —p c? V"y with the equation of 


at 
motion V p = —p ae by deriving the first one with regard to time ae = -pe V" oy 
and the second one with the gradient VT yielding the Laplacian VTV = A, hence 


= Tov ate A x . 
Ap = —pV ;,- Division of the first result by c^ and equating both terms yields the 


lossless wave equation A p = 4 = p that is typically written as 


(A Le )p =0. (6.3) 


c? ðt? 


Obviously, the wave equation relates the curvature in space (expressed by the Lapla- 
cian) to curvature in time (expressed by the second-order derivative). 

If p is a pure sinusoidal oscillation sin(w t + 9), the second derivative in time 
corresponds to a factor —w?, and by substitution with the wave-number k = 2, we 


c > 
can write the frequency-domain wave equation as 


(A+ k?) p=0, Helmholtz equation. (6.4) 


6.3.1 Elementary Inhomogeneous Solution: Green’s 
Function (Free Field) 


The Green’s function is an elementary prototype for solutions to inhomogeneous 
problems (A + k?) p = —q, which is defined as 


(A+k)G = —ô. 
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A general excitation g of the equation can be represented by its convolution with the 
Dirac delta distribution f q(s)ô(r — s)dV (s) = q (r). Consequently, as the wave 
equation is linear, the general solution must therefore also equal the convolution of 
the Green’s function with the excitation function p(r) = f q(s) G(r — s) dV (s) over 
space; if formulated in the time domain: also over time. The integral superimposes 
acoustical responses of any point in time and space of the source phenomenon, 
weighted by the corresponding source strength in space and time. 

The Green’s function in three dimensions is derived in Appendix A.6.3, Eq. (A.91), 


eT ik r 
G= ; 6.5 
Arr oP) 


w 


with the wave number k= 2 
r = yir = r. 

Acoustic source phenomena are characterized by the behavior of the Green’s 
function: far away, the amplitude decays with 1 and the phase —kr = —@* corre- 
sponds to the radially increasing delay £. Both is expressed in Sommerfeld’s radiation 


and distance between source and receiver 


condition lim,—. oo r(2p + ik p) =0. 


Plane waves. The radius coordinate of the Green’s function is the distance between 
two Cartesian position vectors r, and r, the source and receiver location. Letting one 
of them become large is denoted by re-expressing it in terms of radius and direction 
vector rs = rs0,. This permits far-field approximation 


Fs = lr; — rll = V (rbs — r) (rbs — r) = r2 — 2r 01r +r? (6.6) 


1-2% 4f =r- oir. (with lim /T— 2x = 1 — x). 


lim r; = lim rs 
ls 00 r; —> CO 


For the phase approximation, for instance at a wave-length of 30cm, we notice even 
for a relatively small distance difference, e.g. between 15m and 15m + 15cm, we 
could change the sign of the wave. To approximate the phase of the Green’s function, 
we must therefore at least use r, — o'r as approximation. By contrast, this level of 
precision is irrelevant for the magnitude approximation, e.g., it would be negligible 
if we used TE instead of the magnitude BAE: 

At a large distance r, assumed to be constant, the Green’s function is proportional 
to a plane wave from the source direction 6, 


lim G = È ¢ik@tr (6.7) 


Trs—> 00 4 rs 


The plane-wave part is of unit magnitude |p| = 1 


p= ek o'r (6.8) 
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and its phase evaluates the projection of the position vector onto the plane-wave 
arrival direction ðs. Towards the direction 0,, the phase grows positive, i.e. the signal 
arrives earlier. Towards the plane-wave propagation direction —@, the phase grows 
negatively, implying an increasing time delay, which is constant on any plane per- 
pendicular to 0,. 

Plane waves are an invaluable tool to locally approximate sound fields from 
sources that are sufficiently far away, within a small region.” 


6.4 Basis Solutions in Spherical Coordinates 


Figure 4.11 shows spherical coordinates [14, 15] using radius r, azimuth g, and 
zenith V. For simplification, zenith is replaced by ¢ = cos} = +, here. We may 
solve the Helmholtz equation (A + k?) p = 0 in spherical coordinates by the radial 
and directional parts of the Laplacian A = A, + ^y, as identified in Appendix A.3 


a42? A 1-20 2 an 1 ə? 
ro tra OM a ae r a dae 


(6.9) 


We already know the spherical harmonics as directional eigensolutions from Sect. 4.7 


1 
mG) y” 


m _ 
Dec¥, = 2 


(6.10) 

F 

and assume them to be a factor of the solution pi” = R Y} determining the value 

of Ag; in (A; + k+ Agy, p) pn’ = 0. We find a separated radial differential equation 

after insertion, multiplication by nA and re-expressing the differentials 2 = kt 
a _ p28 i 

and a k PCa 


92 ə 
ar VOA 


[er + (kr? — n(n + D] R=0. (6.11) 


Appendix A.6.4 shows how to get physical solutions for R of this, so-called, spherical 
Bessel differential equation: spherical Hankel functions of the second kind hO (kr) 
able to represent radiation (radially outgoing into every direction), consistently with 
Green’s function G, diverging with an (n + 1)-fold pole at kr = 0, a physical behav- 
ior that would also be observed after spatially differentiating G, see Fig. 6.1; spherical 
Bessel functions j, (kr) = RA? (kr)} are real-valued, converge everywhere, exhibit 


>This is because, strictly speaking, an entire plane-wave sound field is unphysical and of infinite 
energy: either the exhaustive in-phase vibration of an infinite plane is required, or an infinite- 
amplitude point-source infinitely far away is required with infinite anticipation tf —> +00 (non- 
causal). 
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Fig. 6.1 Spherical Bessel functions j, (kr) = Ra (kr)} (top left), imaginary part of spherical 
Hankel functions Stn (kr)} (top right), and magnitude/dB of hP (kr)| (bottom), over kr 


an n-fold zero at kr = 0, and can’t represent radiation. Implementations typically 
rely on the accurate standard libraries implementing cylindrical Bessel and Hankel 
functions: 


. TT 1 2) TT 1 2) 
In(kr) = a Jn i (kr), hy (kr) = a Hı (kr). (6.12) 


Wave spectra and spherical basis solutions. Any sound field evaluated at a radius r 
where the air is source-free and homogeneous in any direction can be represented by 
spherical basis functions for enclosed j,, (kr) ¥/" (0) and radiating fields h, (kr) Y/” (0) 


P= > YS [Pam in (kr) + Camby (kr) ]¥;" (0). (6.13) 


n=0 m=—n 


Here, bam are the coefficients for incoming waves that pass through and emanate 
from radii larger than r and Cam are the coefficients of outgoing waves radiating 
from sources at radii smaller than r; the coefficients are called wave spectra of the 
incoming and outgoing waves, cf. [16]. 


Ambisonic plane-wave spectrum, plane wave. Plane waves only use the coeffi- 
cients bam, While Cnm = 0 in Eq. (6.13). The sum of incoming plane waves from 
all directions, whose amplitudes are given by the spherical harmonics coefficients 
Xnm as a set of Ambisonic signals are described by the incoming wave spectrum, see 
Appendix A.6.5, Eq. (A.119) 


bnm = 4a i” Xnm- (6.14) 
Figure 6.2 shows a single plane wave incoming from the direction @, represented by 


Pam = 4r i" Y” (6) (6.15) 
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Fig. 6.2 Plane wave from y axis ọ = 3 = 5 in horizontal cross section; time steps correspond 
to 0°, 60°, 120°, and 180° phase shifts @ in the plot R{p ett} showing p from Eq. (6.13) with 
Cnm = 0 and bym of Eq. (6.15) with bam = 47i” YEG 5); long wave (top), short wave (bottom); 
simulation uses N = 25 and area shows |kx|, |ky| < 27 and 82 


at four different time steps corresponding to 0°, 60°, 120° and 180° time shifts for 
the two wave lengths shown. 


6.5 Scattering by Rigid Higher-Order Microphone Surface 


Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere 
of some radius r = a, suchas the Eigenmike EM32, see Fig. 6.3. The physical bound- 
ary of the rigid spherical surface is expressed as a vanishing radial component of 
the sound particle velocity. The radial sound particle velocity is obtained via the 


Fig. 6.3 32-channel 
higher-order Ambisonic mic. 
Eigenmike EM32 
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Fig. 6.4 Plane waves scattered by rigid sphere ka = m (top) or ka = 47 (bottom); time steps 
correspond to 0°, 60°, 120°, and 180° phase shifts ¢ in the plot R{p e'*} showing p from Eq. (6.13) 
with bnm and Cnm from Eq. (6.15) with bam = 4i” Y" 5, 5) and Eq. (6.16); simulation uses 
N=25 


equation of motion Eq. (6.2) by deriving Eq. (6.13). This requires to evaluate dif- 
ferentiated spherical radial solutions j/ (x) as well as hi) (x), which is implemented 
by fi(x) = = fa(x) — fn+1(x) for either of the functions, cf. e.g. [16]. A sound-hard 
boundary condition at the radius a requires 


. ioe) n 
ae ;! (2) mg) — 
vela = aE XO SS [Bam JE) + cnm BL (kry],_, ¥" (8) = 0, 


n=0 m=—n 


which is fulfilled by a vanishing term in square brackets. The rigid boundary responds 
to incoming surround-sound by velocity-canceling outgoing waves h’ (ka) Cam = 
—j/ (ka) bam. The coefficients Yam yield the sound pressure in Fig. 6.4, 


jy, (ka) 


= nm Y"0 n with nm = iJa kr — h2 kry 
Ta a OTD Ga) 


n=0 m=—n 


(6.16) 


The two terms of the bracket are typically further simplified by a common 
denominator and recognizing the Wronskian Eq. (A.97) in the numerator 


In(x)hi, O OnO __ i 
hi (x) =T Lh (x) 


i 


Wnm | =a = om Ve Dam 
(ka)? hn” (ka) 


(6.17) 
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Fig. 6.5 Attenuation/dB of Ambisonic signals of different orders for varying values of ka 


Relation of recorded sound pressure to Ambisonic signal. The scattering equa- 
tion relates the recorded sound pressure expanded in spherical harmonics to the 
Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5, 


4r j”+! 


Wnm = ees at ee 
(ka)? hi? (ka) 


Xnm- (6. 1 8) 


It is formally convenient that as soon as the sound pressure is given in terms 
of its spherical harmonic coefficient signals Yam, the Ambisonic signals Xam of a 
concentric playback system are obviously just an inversely filtered version thereof, 
with no need for further unmixing/matrixing. 

Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger 
than the diameter 2a of the sphere will only weakly map to complicated high-order 
patterns. It is therefore easily understood that the transfer function i”*+![ (ka)? hi) 
(ka)]~! attenuates the reception of high-order Ambisonic signals at low frequencies, 
see Fig. 6.5. 


6.6 Higher-Order Microphone Array Encoding 


The block diagram of Ambisonic encoding of higher-order microphone array signals 
is shown in Fig.6.7. The first processing step is about decomposing the pressure 
samples p(t) from the microphone array into its spherical harmonics coefficients 
W(t): To which amount do the samples contain omnidirectional, figure-of-eight, and 
other spherical harmonic patterns, up to which the microphone arrangement allows 
decomposition. The frequency-independent matrix (Y wt does the conversion. It is 
the left-inverse to the spherical harmonics sampled at the microphone positions, as 
shown in the upcoming section. 

The second step then sharpens the sound pressure image to an Ambisonic signal by 
filtering the spherical harmonic coefficient signals. The basic relation between sound 
pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a 
filter for every coefficient signal, differing only in filter characteristics for different 
spherical harmonic orders. Robustness to noise, microphone matching and position- 
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Fig. 6.6 Plane-wave sound pressure image R{ p e~‘**} on rigid sphere with varying ka using Wm 
from Eq. (6.17) expanded over the spherical harmonics p = $` Wam Yj," and Xnm = Y} (0, 0) for 
a plane wave Bie z. With the wave length A = F the value ka is related to a diameter 2a of 
ka _ 2n Ef a 


Za in wave lengths to express frequency dependency; simulation uses N = 50; for 
a= L4. ve cm, ka values correspond to f = 125, 250, 500, 1000, 2000, 4000, 8000, 16000 Hz 
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Fig. 6.7 Higher-order 


Ambisonic microphone O—> — => 
encoding: sound pressure 

samples p(t) are O—> _ > 
spherical-harmonics 

decomposed by the matrix O—> => —> 
D'i, and the resulting 

coefficient signals Yy (t) are O—> => —> 


converted to Ambisonic 
signals xy (t) by the 
sharpening filters pp (œ) 


—>| at) |— 


spherical radial 
P harmonics Wn focusing XN 
encoder filters 


ing is the key here, and only achieved by the careful design of these filters, as shown 
in a further sections below. The design considers a gradually increasing sharpening 
over frequency, for which it moreover employs a filter bank with separate, max-r g 
weighted and E normalized bands, in order to provide (i) limitation of noise and 
errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of 
the sidelobes. 


6.7 Discrete Sound Pressure Samples in Spherical 
Harmonics 


To determine the Ambisonics signals Xnm, we obviously need to find Yam based 
on all sound pressure samples p(0;) recorded by the microphones distributed on 
the rigid-sphere array. To accomplish this, we set up a system of model equations 
equating the pressure samples to the unknown coefficients Ynm expanded over the 
spherical harmonics Y/”(0;) sampled at every microphone position. A vector and 
matrix notation p = [p(@;)]; and yi = [y(9;)"]i.nm is helpful 


p61) YEO) ... YNO) Woo 


POW] [28O ... ¥N(m) | | Ynn 
Pyn = Yh Ýn- (6.19) 


Left inverse (MMSE). The equation can be (pseudo-)inverted if the matrix Yy is 
well conditioned. Typically more microphones are used than coefficients searched 
M > (N + 1)”. Inversion is a matter of mean-square error minimization: As the M 
dimensions may contain more degrees of freedom than (N + 1)’, the coefficient 
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vector Wy giving the closest model py to the measurement p is searched, 


min||el|’, with e = py — P = YN Ýn- P- (6.20) 
N 


The minimum-mean-square-error (MMSE) solution is, see Appendix A.4, Eq. (A.65), 
Wy = YNY Yn p = YN) P. (6.21) 


The resulting left inverse (YNY)! Yy inverts the thin matrix YẸ from the left. 
YD symbolizes the pseudo inverse; it is left-inverse for thin matrices. 

If the microphones are arranged in a t-design and the order N is chosen suitably, 
then the transpose matrix times t is equivalent to the left inverse. A more thorough 
discussion on spherical point sets can be found in [17-19]. 

The maximum determinant points [20] are a particular kind of critical directional 
sampling scheme that allows to use exactly as few microphones M = (N + 1)? 
as spherical harmonic coefficients obtained, yielding a well-conditioned square 
matrix Yy, so that it can be inverted directly without left/pseudo-inversion. The 25 


maximum-determinant points for N = 4 are used in the simulation example below.? 


Finite-order assumption and spatial aliasing. An important implication of estimat- 
ing Wnm is that we need to assume that the distribution of the sound pressure is of 
limited spherical harmonic order on the measurement surface. This could be done 
by restricting the frequency range, as high-order harmonics are attenuated well- 
enough according above suitable frequency limits, cf. Fig.6.5. However, low-pass 
filtered signals are unacceptable in practice. Instead, one has to accept spatial alias- 
ing at high frequencies, i.e. directional mapping errors and direction-specific comb 
filters. Figure 6.8 shows spatial aliasing of Yy = (Y i ee p in the angular domain 


P=} ie: 


6.8 Regularizing Filter Bank for Radial Filters 


The filters i” [ (ka)? hi) (ka)] of Fig. 6.5 exhibit an nth-order zero at 0 Hz, ka = 0. 
To retrieve the Ambisonic signals x,,, from the sound pressure signals Yam, their 
inverse would have a n-fold (unstable) pole at O Hz. Considering that microphone 
self noise and array imperfection cause erroneous signals louder than the acoustically 
expected nth-order vanishing signals around 0 Hz, filter shapes will moreover cause 
an excessive boost of erroneous signals unless implemented with precaution. Filters 
of the different orders n must be stabilized by high-pass slopes of at least the order 
n, see also [6, 9, 21-25], and with (n + 1)th-order high-pass slopes, see Fig. 6.9, 
such errors are being cut off by first-order high-pass slopes at exemplary cut-on 
frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a 


3md04.0025 on https://web.maths.unsw.edu.au/~rsw/Sphere/Images/MD/md_data.html. 
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Fig. 6.8 Interpolated plane-wave sound pressure image R{p eitè} on rigid-sphere array with 25 
microphones allowing decomposition up to the order N = 4; simulation uses orders up to 25, and 
the aliasing-free operation can only be expected within kr < N 
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Fig. 6.9 Filters (ka)? nO (ka)/dB over frequency/Hz, regularized with (n + 1)th-order high-pass 
filters 


32Hz 63Hz 125Hz 250Hz 500Hz ~ 1kHz 2kHz 4kHz 8kHz 


Fig. 6.10 Stabilizing filter bank/dB over frequency/Hz: signal orders n > b are excluded from the 
band b 


noise boost of 20 dB for a 4th-order microphone with a = 4.2 cm, at most. However, 
just cutting on the frequencies of each order is not enough: every cut-on frequency 
causes a noticeable loudness drop below due to the discarded signal contributions. It 
is better to design a filter bank with crossovers instead, which allows compensation 
for the loudness loss in every band. A zero-phase, nth-order Butterworth high-pass 
response is defined by Hpi = ror and amplitude-complementary to the low pass 
Ho = = so that Api + Ajo = 1. 

Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The 
band-pass filters H, (œw) are composed of a (b + 1)th-order high- and (b + 2)th-order 
low-pass skirt at wp, and wp+1, respectively, except for the band b = 0 (low-pass) 
and b = N (high-pass) 


1 jay" 1 (2 
Alo) = ———,, Alo) = — TNO 
1$ (L7 i (2p 14 (2 re 1+ wnt 
(6.22) 


To make the bands perfectly reconstructing, filters are normalized by the sum 
response 


Ay 


6, (6.23) 
SN Apo) 
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By adjusting the cut-on frequencies w, of the different orders b = 1,..., N, the 
noise and mapping behavior of the microphone array is adjusted; only the zeroth 
order is present in every band down to 0 Hz. 

This filter bank design moreover allows to adjust loudness and sidelobe suppres- 
sion in every frequency band, separately. 


6.9 Loudness-Normalized Sub-band Side-Lobe 
Suppression 


The filter bank design shown above would only yield Ambisonic signals whose 
order increases with the frequency band. Ideally, this variation of the order comes 
with the necessity of individual max-rg sidelobe suppression in every band. More- 
over, Ambisonic signals of different orders are differently loud, so also diffuse-field 
equalization of the E measure is desirable in every band. 

To fulfill the above constraints, we propose to use the following set of FIR fil- 
ter responses as given in [26, 27], that are modified by a filter bank employing 
diffuse-field normalized max-rg-weights in separate frequency bands b = 0,...,N, 
cf. Fig. 6.11, with the nth order discarded for bands below b < n: 


N 
puo) = [Y ans Hoon |" ha? e" (6.24) 


b=n 


Here, e** removes the linear phase of KO, and a,» is the set of diffuse-field (JE ) 
equalized max-rg weights for the band b in which the Ambisonic orders retrieved 
are0 <n <b 


137.9° -D [ Pa (cos 92271) | 
ny = posag) | Eel ea torn <p (6.25) 


0, otherwise. 


Figure 6.12 shows the polar patterns of the corresponding direction-spread functions. 

For the implementation of p, (w) by fast block filtering, w = 2x f andk = w/c are 
uniformly sampled with frequency, and the inverse discrete Fourier transform yields 
the associated impulse responses (attention: the value at 0 Hz must be replaced for 
stable results, and cyclic time-domain shifts and windows are necessary). 

The direction-spread function of a plane-wave sound pressure mapped to a direc- 
tional Ambisonic signal becomes frequency-dependent as shown in Fig. 6.13, and it 
has minimal side lobes. 
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Fig. 6.11 Filter-bank-regularized/dB over frequency/Hz, diffuse-field equalized max-rg weighted 
spherical microphone array responses using i” pn (@) = ys anp Hp(@) (ka)? no (ka) 
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Fig. 6.12 Diffuse-field equalized (to E = 1) max-rg direction-spread functions; even orders are 
plotted on upper, odd orders on lower semi-circle 
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Fig. 6.13 Direction spread/dB over frequency/Hz in zenithal cross section/degrees through 
Ambisonic signal of simulated microphone processing response to plane wave from zenith and 
the parameters a = 4.2 cm, M = 25 mics., max-rg-weighted in bands 90, 680, 1650, 2600 Hz for 
the cut on of the orders 1, 2, 3, 4. Simulation is done with the order Nsim = 30 and spatial aliasing 
will occur above 5.2 kHz. Gain matching was assumed to be up to < +0.5 dB accurate; the map 
shows the direction spread normalized to its value at 0° for every frequency to make its shape easier 
to read 


6.10 Influence of Gain Matching, Noise, Side-Lobe 
Suppression 


Typical gain mismatch between the microphones is not always more accurate than 
0.5 dB. The result is that the physically dominant omnidirectional signal will leak into 
the higher-order signals by directionally random gain variations. However, acousti- 
cally, higher-order components are expected to be weak and to require amplification. 
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(a) 50, 160, 500, 1600 cut on with < +0.5 dB gain matching, no sidelobe suppression 
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(b) 50, 160, 500, 1600 cut on with only 4"*-order sidelobe suppression, assuming perfect gain 
match 
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(c) 50, 160, 500, 1600 cut on with individual max-rg sidelobe suppression per band, assuming 

perfect gain match 


Fig. 6.14 Influence of carelessly selected cut-on frequencies for regularization (top), and of non- 
individual sidelobe suppression per band (middle), in contrast to ideal results (bottom); the maps 
show direction spreads normalized to their values at 0° for every frequency to make side lobes 
easier to read 


The effect on mapping is equivalent to one of microphone self noise, however gain 
mismatch yields a correlated signal exciting the microphones, whereas self-noise 
yields low-frequency noise. 

If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression 
turned off for testing, one would get the poor image as in Fig. 6. 14a, where high-order 
signals at low frequencies are highly boosted. 

If a noise-free case is assumed, and only the max-rg side-lobe suppression of the 
highest band is used for all bands, one gets the image in Fig. 6.14b, which improves 
with individual max-rg weights in Fig. 6.14c. 


Self-noise behavior. Assuming that self-noise of the microphones is uncorre- 
lated, it will also remain uncorrelated and of equal strength after decomposing the 
M microphone signals p; = JV into the (N + 1)? spherical harmonic coefficient sig- 
nals Wam = UY NV „if M ~ (N+ 1)? and the microphone arrangement permits a 
well-conditioned pseudo inversion Y|. The spectral change of the microphone self 
noise due to the radial filters p„(œw) can be described by the noise of the (2n + 1) 
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Fig. 6.15 Self-noise modification |G(w)|?/dB over frequency/Hz for the filter bank configurations 
using the cut on frequencies 2k, 3k, 4k, 5k (no noise amplification), 600, 2k, 3.5k, 4.2k (5 dB noise 
amplification), 280, 1.3k, 2.6k, 3.6k (10 dB noise amplification), 150, 950, 2k, 3.15k (15 dB noise 
amplification), and 90, 680, 1.65k, 2.6k (20 dB noise amplification) 


signals of the same order, amplified by |p, (w)|?, in comparison to the zeroth-order 
signal: 


EN + Deno)? 


Gœ)? = 
ae I(kay2 A (kay/? 


(6.26) 


Figure 6.15 analyzes the noise amplification for the simulation example (max-rg 
weighting in each sub band, a = 4.2 cm) and shows the dependency on exemplary 
cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise 
boosts. The trade here is: the more noise boost one can allow, the more directional 
resolution one gets, see Fig. 6.16. 

Open measurement data (SOFA format) characterizing the directivity 
patterns of the 32 Eigenmike em32 transducers are provided under the link 
http://phaidra.kug.ac.at/0:69292. They are measured on a 12° x 11.25° azimuthx 
zenith grid, yielding 480 x 256 pt impulse responses for each of the 32 transducers. 


6.11 Practical Free-Software Examples 


6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM 
Plug-In Suites 


We give a practical signal processing example for the Eigenmike em32 which is 
applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix 
multiplication (IEM MatrixMultiplier), cf. Fig.6.17a, yielding 25 fourth-order 
signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:7923 1. The 
radial filtering that sharpens the surround sound image uses mc£x-convolver, see 
Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different 
filter curves for the orders n = 0, ..., 4 as defined above. The convolver presets (wav 
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Fig. 6.16 Direction spread/dB for over frequency/Hz and zenith/degrees of filterbank with different 
settings to achieve 0, 5, 10, 15, 20 dB noise boosts; the maps show direction spreads normalized to 
their values at 0° at every frequency as above 
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Fig. 6.17 IEM MatrixMultiplier encoding the Eigenmike em32 signals and mcfx- 
convolver applying radial filters to encoded em32 recording 
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Fig. 6.18 Practical equalization of the em32 transducer characteristics by two parametric shelving 
filters of the mc£x_filter, cf. [28] 


files and config files for mc£x-convolver) are provided online http://phaidra.kug. 
ac.at/o:79231 and are available for the different noise boosts 0,5, 10, 15, 20 dB. 

As found in [28], the em32 transducers exhibit a frequency response that favors 
low frequencies and attenuates high frequencies. This behavior is sufficiently well 
equalized in practice using two parametric shelving filters, a low shelf at 500 Hz 
with a gain of —5 dB, and a high shelf at 5 kHz using a gain of +5 dB, see Fig. 6.18. 


6.11.2 SPARTA Array2SH 


The SPARTA suite by Aalto University includes the Array2sH plug-in shown in 
Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It 
provides both encoding of the signals, as well as calculation and application of radial- 
focusing filters based on the geometry of the array. It supports rigid and open arrays 
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Array2SH 

= Inputs = Encoding Settings 
Presets: Eigen 
No. Sensors: 
Array r (m): 


Sensor r (m): 
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Encoding Order: Reg. Type: 
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c (m/s): Max Gain (dB): 
0.000 

Array Type: Sphericai Post Gain (dB): 
58.000 

Weight Type: Rigid CH Order: 
35.000 

Max Freq. (Hz): 00 Normalisation: 


Fig. 6.19 SPARTA Array2SH encoding for, e.g., em32 


and comes with presets for several arrays, such as the Eigenmike em32. The plug-in 
allows to adjust the radial filters in terms of regularization type and maximum gain. 
The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9. 
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Chapter 7 A) 
Compact Spherical Loudspeaker Arrays cigit; 


Le haut-parleur anonymise la source réelle. 
Pierre Boulez [1], ICA, 1983. 
... adjustable radiation naturalize[s] alien sounds by embedding 
them in the natural spatiality of the room. 
OSIL project [2], InSonic, 2015. 


Abstract This chapter introduces auditory objects that can be created by adjustable- 
directivity sources in rooms. After showing basic positioning properties in distance 
and direction, we describe physical first- and higher-order spherical loudspeaker 
arrays and their control, such as the loudspeaker cubes or the icosahedral loudspeaker 
(IKO). Not only static auditory objects, but such traversing space by their time- 
varying beam forming are considered here. Signal dependency and different practical 
setups are discussed and briefly analyzed. This young Ambisonic technology brings 
new means of expression to sound reinforcement, electroacoustic or computer music. 


While surrounding Ambisonic loudspeaker arrays play sound from outside the 
listening area into the audience, compact spherical loudspeaker arrays play sound 
into the room from a single position. Directivity adjustable in orientation and shape 
can be used to steer sound beams in order to excite wall reflections in the given, 
acoustic environment. The directional shapes and orientations of such beams are 
all controlled by—guess what—Ambisonic signals. Despite the huge practical dif- 
ference, both applications do not only share the spherical harmonics that lend their 
shapes to Ambisonic signals: The control of radiating sound beams employs nearly 
the same model- or measurement-based radial steering filters as those of compact 
higher-order Ambisonic microphones. 


The works of Warusfel [3], Kassakian [4], Avizienis [5], Zotter [6, 7], Pomberger 
[8], Pollow [9], Mattioli Pasqual [10] established the electroacoustic background 
technology required to describe compact spherical loudspeaker arrays built with 
electrodynamic transducers. The early works on auditory objects were written by 
Schmeder [11], Sharma, Frank, and Zotter [2, 12, 13]. And some contemporary 
results were found in the project “Orchestrating the Space by Icosahedral Loud- 
speaker” (OSIL) between 2015 and 2018 [14-19]. 
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7.1 Auditory Events of Ambisonically Controlled 
Directivity 


7.1.1 Perceived Distance 


Laitinen showed in [20] that increasing the directivity of a listener-facing loudspeaker 
array from omnidirectional to second order was able to create auditory events that 
were perceptually closer than the physical distance to the loudspeaker array. The 
experimental results can be explained by the increase of the direct-to-reverberant 
energy ratio, as the sound beam of the directional source does not as much excite 
room reflections. 

Wendt extended Laitinen’s work by experiments employing a simulation of a 
third-order directional source in a virtual room (third-order image source model) 
played back by a loudspeaker ring in an anechoic room [16]. He could show that the 
perceived distance between the listener and the higher-order directional source could 
not only be controlled by the order of the directivity pattern but also by the orientation 
of the source (towards the listener, away from the listener). Beams projecting sounds 
away from the listener were perceived behind the source, cf. Fig.7.1. Again, the 
perceptual results could be modeled by simple measures known from room acoustics. 


7.1.2 Perceived Direction 


Using a similar room simulation, the study in [21] asked participants to indicate the 
perceived direction of an auditory event created by a third-order directional source. 
The results showed that for different source orientations, listeners perceived auditory 
objects at directions that often did not coincide with the sound source, but with the 
delayed reflection paths, cf. Fig. 7.2. Perceived directions focused on the direct sound 
and the three first reflections after 6, 8, and 9 ms. For some orientations, still even 
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Fig. 7.2 IKO perceived directions (black circles, radii indicate relative amount of answers) and 
modeling (gray crosses), 3rd-order max-rg beam, 2nd-order image source model. Gray shading in 
the background indicates level of each path 


the second-order reflections at 12 and 14 ms were dominating localization. However, 
the influence of later reflections is reduced by the precedence effect. The perceived 
directions can be modeled by the extended energy vector originally developed for off- 
center listening positions in surrounding loudspeakers arrangements, as also shown 
in [17]. Experiments in [22] showed that panning between a reflection and the direct 
sound creates auditory objects in between. When applying the appropriate delay 
and gain to the direct sound to compensate for the longer path of the reflection, the 
localization curves are similar to those of standard stereo using a pair of loudspeakers. 


7.2 First-Order Compact Loudspeaker Arrays and Cubes 


The simplest way of creating a loudspeaker array with adjustable directivity in a 
practical sense is a cube with loudspeakers on its plane surfaces, as suggested by 
Misdariis [23]. Restricting the directivity control to two dimensions reduces the 
number of loudspeaker drivers to four and facilitates to equip the array with a carrying 
handle on top and a flange adapter at the bottom, cf. [24] and Fig. 7.3. 


Directivity control. First-order Ambisonics utilizes monopole and dipole modes, 
which directly translate to the corresponding far-field radiation patterns. These modes 
can easily be created due to the cubic shape by either playing of all four drivers in 
phase or the opposing drivers out of phase, cf. Fig. 7.4. Nevertheless, the frequency 
responses of such monopole and dipole modes need to be equalized to enable their 
phase- and magnitude-aligned superposition in the far field. Filters and measurement 
data of cube loudspeakers built at IEM [24] are freely available on http://phaidra. 
kug.ac.at/0:67631. 


156 7 Compact Spherical Loudspeaker Arrays 


g 


Pean 


(a) loud speaker cube (b) vertical cross section (c) horizontal cross section 


Fig. 7.3 Design of a loudspeaker cube: prototype, and vertical and horizontal cross section plots 
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Fig. 7.4 System controlling the monopoles and dipole modes of the loudspeaker cubes, to accom- 
plish first-order beamforming with the shape parameter œ and beam direction go 


To overcome the compressive effort of interior volume changes at low frequencies, 
the filter Hyct in Fig. 7.4 equalizes the smaller velocity of the loudspeaker cones when 
driven omnidirectionally to the velocity when driven in dipoles as a first step, and 
as a second step, it attenuates the monopole pattern slightly to account for its more 
efficient radiation at low frequencies. The filter Hgo is a general equalizer required 
to obtain a flat frequency response, 0 < œ < 1 is a first-order omni to dipole beam- 
shape parameter, and go is the beam direction. The filter Hy. can be specified as a 
5th-order IIR filter purely based on geometric and electroacoustic parameters [19]. 


Direct and indirect sound with two cubes. The study in [19] examined the width 
of the listening area for the creation of a central auditory object between a pair of 
loudspeaker cubes cf. Fig. 7.5. Steering the two beams directly at the listener yielded 
a narrow listening area that increased with the distance to the loudspeakers, similar 
as known from typical stereo applications, cf. Fig. 2.9. A much wider listening area 
is achieved by steering the beams to the front wall to excite reflections. To this end, 
max-r g (super-cardioid) beams were chosen and oriented in a way to ideally suppress 
direct sound from the loudspeaker cubes at the listening position. The proposed setup 
of two loudspeaker cubes can be used to play back stable L, C, R channels of a 
surround production without the need of an actual center loudspeaker. 


Surround with depth: Together with the distance control described by Laitinen [20], 
the stable in-between auditory image has been used in [19] to establish a surround- 
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Fig. 7.5 Width of the listening area for a central auditory object at two distances from a pair of 
loudspeaker cubes with different orientation of max-rg/super-cardioid beams 


with-depth system consisting of a quadraphonic setup of four loudspeaker cubes. As 
first layer, it uses the direct sounds from the 4 loudspeakers from 45° and +135° 
together with the 4 in-between images at 0°, +90°, and 180° to obtain 8 directions 
for third-order Ambisonic surround panning. As a second layer for depth, surround 
with depth uses 4 cardioid beams pointing into the 4 room corners to provide the 
impression of distant sounds. Blending between those two layer is used to control 
the distance impression of surround sounds. 


7.3 Higher-Order Compact Spherical Loudspeaker Arrays 
and IKO 


With transducers mounted on spheres or polyhedra, higher-order radiators can be 
built. Typically, those are Platonic solids such as dodecahedra or icosahedra, as they 
can easily be manufactured from equal-sided polygons cf. Fig. 7.6. Often, the loud- 
speakers are also mounted onto a common interior volume. Hereby, the higher-order 
modes can be controlled at reduced impedance of the inner stiffness, however, this 
also causes acoustic coupling of the transducer motions. Typically, multiple-input- 
multiple-output (MIMO) crosstalk cancellers are employed to suppress the coupling 
and to control the velocity of the transducer cones. If this is accomplished, the 
acoustic radiation can be modeled and equalized by the spherical cap model, 
cf. [6, 15, 25, 26]. 
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Fig. 7.6 Powerful icosahedral loudspeaker array (IKO by IEM and Sonible) and reflecting baffles 
in Ligeti concert hall, in preparation of an electroacoustic music concert 


Cap model. Higher-order loudspeaker arrays on a compact spherical housing are 
modeled by the spherical cap model. It assumes for the exterior air that the radial 
surface velocity is a boundary condition consisting of separated spherical cap shapes 
of the size a centered around the directions {@;}, each unity in value. These idealized 
transducer shapes driven by the transducer velocities v; compose the surface velocity 


L 


v0) = X (0/6 — cos $) vi. (7.1) 
1=0 


Here, u(¢) denotes the unit step function that is unity for ¢ > 0 and zero otherwise. 
The surface velocity distribution can be decomposed into spherical harmonics as 


(oe) 


n L 
vO)=>> D> ¥PO) DY wv. (7.2) 
1=0 


n=0 m=—n 


The coefficients w®, of the /th cap are defined by spherical convolution Eq. (A.56) 
of a Dirac delta 5(0; 0 — 1) pointing to the cap center with a zenithal cap u(cos 0 — 
cos $): 


wi? =Wn Ya (8), (7.3) 


nm 


where Y;”(0,) are the coefficients expressing the Dirac delta, extended to a cap by 
! , P,(¢) dé is derived in Eq. (A.60) 


a 
cos 7 


weighting with w,. The term w, = 27 


_ Payı (cos $)—cos $ P, (cos 4) 


Wn = 20 a 
1—cos 5, forn = 0. 


, forn > 0, (7.4) 
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Decoder. Without radiation control yet, any low-order target spherical harmonic 
n < N can be synthesized as velocity pattern nm by superimposing the spherical cap 
coefficients wh) with suitable transducer velocities v, ie. @am = 2) Wn Yj" (01) vı. 
We write a matrix/vector notation with the matrix Y = [y(@,),..., y(@_)] containing 
the spherical harmonics y(@) = [Y/"(0)]nm sampled at the transducer positions {0;} 


to represent Dirac deltas pointing there, and w = [wp ]nm to represent the cap shape, 
ġ = diag{w}Y v. (7.5) 


As long as the order N up to which coefficients are controlled is low enough L > 
(N + 1)? and transducers are well-distributed, perfect control is feasible. The corre- 
sponding velocities are found by solving a least-squares problem, see Appendix A.4, 
Eq. (A.63), yielding the right inverse of the Nth-order cap-coefficient matrix, 


v= YiGNyYs yt diag{wn} ‘ox = = Ddiag{wn}~ Oy. (7.6) 


The right inverse D = YTY N yi is a mode-matching decoder, cf. Eq. (4.40). 


Exterior problem. The radiated sound pressure is described by the exterior problem 
denoted by the coefficients C„m in Eq. (6.13) and the spherical Hankel functions 
h®)(kr). To relate it to a time-derived surface velocity at the array radius r = a, we 


derive the exterior solution with regard to radius + P- =k 2 = = —ikc pv, cf. Eq. (6.2), 


v0) = +> y h'® (ka) Y” (0) Cnm. 77) 


n=0 m=—n 


Comparing Eq. (7.2) to Eq. (7.7) yields Cnm = pclih" (kaJ) YE o Wn ¥2"(8)) vi, 
the coefficients to calculate the radiated pressure. Far away, we replace the spher- 
ical Hankel function that approaches h® (kr) > i”+!k~le-*" by the term i”*!k7! 
in Eq. (6.13) so that the radiated far-field sound pressure p x X> i"t!k IY” Cam 
becomes 

L 


POY >> YO Po Y3” (01) vi. (1.8) 


n=0 m=—n 1=0 


7.3.1  Directivity Control 


The spherical harmonics coefficients of the far-field sound pressure pattern in 
Eq. (7.8) are controlled by the cap velocities v; 


L 


1” Wn 
nm = > DY" (81) v, (7.9) 
Yam = 7D cgay 2 E 
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and we desire to form the directional sound beam they represent according to a 
max-rg pattern a, Y?” (00) yielding radiation focused towards 09 


Wnm = dn yi" (00). (7.10) 


To find suitable cap velocities v;, we equate the model Eqs. (7.9) and (7.10). In 
matrix/vector notation never used the equation is 


diag{[i" w, k7"/ Ai (ka)|nm} Yv = diag{[an|nm}y (60). (7.11) 


The diagonal matrix on the left is easy to invert, and for patterns up to the order 
n < N, the mode-matching decoder D of Eq. (7.6) already gives us a way to define 
velocities inverting the matrix Yy from the right. The preliminary solution becomes 


=> v= Ddiag{[i"w,! kh! (ka) anlnm} Yn(60). (7.12) 


On-axis equalized, sidelobe-suppressing directivity control limiting the excursion. 
The inverse cap shape coefficient w7! and the max-rg weight a, can be regarded as a 
part of the radiation control filters i-” k A’ (ka). The expression i~”~! (ka)? h’ (ka) 
of compact spherical microphone arrays (Sect. 6.6) qualitatively differs by a factor 
k. Practical implementation of radiation control filters and their regularization is 
therefore quite similar to radial filters of spherical microphone arrays. There are 
three main differences, as explained in [15]: 


With loudspeaker arrays, it is rather the excursion that is limited, which primarily 
entails a different strategy of adjusting the filter bank cut-on frequencies, which due 
to size are at lower frequencies where group-delay distortions are less disturbing, 
and linear-phase implementations would cause avoidably long delays. 

Moreover, instead of cut-on filter slopes of (n + 1)th order required for noise 
removal in signals obtained from spherical microphone arrays, limited excursion 
requires cut-on slopes of at least (n + 3)th order, i.e. 4th order to cut on the 1st- 
order Ambisonic signals. Thereof, one additional order is caused by the qualitative 
difference of k~! in radial filters, and another order by the conversion of velocity 
to excursion by a factor (iw) !. 

Finally, instead of diffuse-field equalization that is useful for surround sound play- 
back of spherical microphone array signals, it is more useful to equalize spherical 
sound beams on-axis (free field). 


On-axis equalization yields a different scaling of the sub-band max-rg weights 


o So (2n+1)P, (cos B22 
P,(cos #243) ADRAR form < b 
an,b = 2n 2nt+1) Py, (cos 13799 ) 


0, otherwise. 


(7.13) 
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Typically, cut-on frequencies for compact spherical loudspeaker arrays are low, 
and linear-phase filterbanks would require long pre-delays. It is useful to employ 
Linkwitz-Riley filters for the crossovers, to get a low-latency implementation. To 
emphasize the similarity to Eq. (6.22), we write Linkwitz-Riley filters [27] as com- 
bination of an all-pass A” with twice the phase response of an mth-order But- 
terworth low-pass combined either with the magnitude-squared low-pass response 
[1 + (w/@,)7"]~! or high-pass response (w/a,)*”[1 + (w/@,.)" |! per crossover. 
Such a minimum-phase crossover is of even order, so that the minimum-order cut-on 
slope must be rounded up to the next even order 223). Plain high/low crossovers 
would be in-phase unless combined with further crossovers to form narrower bands. 
However, an in-phase filterbank is obtained after inserting the product of all all-passes 
in every band, cf. [28]. Although non-minimum-phase, this is still low-latency. For 
the band b containing Ambisonic orders 0 < n < b, the modified filterbank is 


(p 7 1 N b'+3 
Liao" — a [40,7 '@). (7.14) 
be) ea (ey 


The sum >", H;(@) is considered to be sufficiently flat, so that the radial filters for 
compact spherical loudspeaker arrays using Eqs. (7.8), (7.13), (7.14) become 


N 
Pn(@) = X dn.» Hy (o) i” w,! hO (ka) e. (7.15) 
b=n 


Figure 7.7 shows the block diagram to control compact spherical loudspeaker arrays 
by Ambisonic input signals, including the radiation control filters, the decoder, and 
a voltage-equalizing crosstalk canceller feeding the loudspeakers. 


Fig. 7.7 Signal processing 
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7.3.2 Control System and Verification Based 
on Measurements 


Velocity equalization/crosstalk cancellation. In the frequency domain, laser vibrom- 
eter measurements, cf. Fig.7.8a, characterize the physical multiple-input-multiple- 
output (MIMO) system of transducer input voltages u;(@) to transducer velocities 
vi (w) 


v(w) = T (w) u(a), (7.16) 


including the effect of acoustic coupling through the common enclosure. Corre- 
sponding open measurement data sets! can be found online, as described in [18]. 
Theoretically, the frequency-domain inverse of the matrix T(w) can be used to 
equalize and control the transducer velocities with acoustic crosstalk cancelled, as 
indicated in Fig. 7.7, 


ulw) = T! (w) vw). (7.17) 
In practice, this is only useful up to the frequency at which the loudspeaker cone 
vibration breaks up into modes, so typically below 1 kHz. 
Control system: The entire control system with Ambisonic signals x \(q@) as inputs 


uses Eqs. (7.6), (7.15), (7.17) 


u(w) = T~'(@) Ddiag{p(w)} xn (@). (7.18) 


Directivity measurement. It is useful to characterize the directivity obtained by 
measurements to verify the results; high-resolution 648 x 20 measurements G(w) 
of the IKO are found online!. The sound pressure can be decomposed with the 
known directional sampling by left-inversion of a spherical harmonics matrix Y Ii 
see Appendix A.4, Eq. (A.65, which can be up to 17th order on a 10° x 10° grid in 
azimuth and zenith: 


po) = Glo) ulw), > puw) = Y Po). (7.19) 


With the highly resolved spherical harmonics coefficients, polar diagrams or balloon 
diagrams can be evaluated at any direction 


PO, w) = y0) h0), (7.20) 


given any control system delivering suitable voltages u for beamforming, as e.g. 
obtained by Eq. (7.18). 


‘http://phaidra.kug.ac.at/o:67609. 
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(a) velocities (b) directivities 


Fig. 7.8 Measurements on the IKO as a MIMO system in terms of transducer output velocities 
(left) and radiation patterns (right) depending on the transducer input voltages 
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Fig.7.9 Horiontal cross section of the IKO’s directivity/dB over frequency/Hz and azimuth/degrees 
when beamforming to 0° azimuth on the horizon, with radiation control filters above, with filterbank 
frequencies (38, 75, 125, 210) Hz 


To inspect the frequency-dependent directivity, a horizontal cross section is shown 
in Fig 7.9. The beamforming gets effective above 100Hz and a beam width of 
+30° is held until 2 kHz. The filterbank starts the Oth order above 38 Hz, and with 
75, 125, 210 Hz, 1st, 2nd, and 3rd order are successively added including on-axis 
equalized max-rg weightings. Above 2 kHz both spatial aliasing and modal breakup 
of the transducer cones affect directivity. However, these beamforming-direction- 
dependent distortions are often negligible in typical rooms. 
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7.4 Auditory Objects of the IKO 


7.4.1 Static Auditory Objects 


The study in [16] showed that distance control by changing the directivity and its 
orientation can also be achieved with the IKO in a real room, cf. Fig.7.10. The 
experiments used stationary pink noise and could create auditory objects nearly 2m 
behind the IKO, which corresponds to the distance between the IKO and the front 
wall of the playback room. 

The maximum distance of auditory objects created by the IKO is strongly signal- 
dependent. Experiments in [14] showed that the auditory distance of pink noise 
bursts decreased for shorter fade-in times, while the fade-out time had no influence, 
cf. Fig. 7.11. A transient click sound was perceived even closer to the IKO. This can 
be explained by the precedence effect, that favors the earlier direct sound over the 
reflected sound from the walls. While this effect is strong for transient sounds, it is 
inhibited for stationary sounds with long fade-in times. 

However, the precedence effect can even be reduced for transient click sounds 
by simultaneous playback of a masker sound that reduces the influence of the direct 
sound [29]. In comparison to no masker, playing a pink noise masker doubles the 
auditory distance, cf. Fig.7.12. Using the room noise as a masker by playing the 
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target sound very softly further increases the distance and yield a perception that is 
detached from the IKO. 


7.4.2 Moving Auditory Objects 


The studies in [14, 15] extended the previous listening experiments towards simple 
time-varying beam directions, such as from the left to the right, front/back or circles. 
To report the perceived locations of the moving auditory objects, listeners used a 
touch screen that showed a floor plan of the room, including the listening position 
and the position of the IKO. They had to indicate the location of the auditory object’s 
trajectory every 500 ms. The perceived trajectories depend on the listening posi- 
tion, but they can always be recognized, cf. Fig. 7.13. The empirical knowledge was 
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Fig. 7.13 Average perceived locations for each 500 ms step during front/back-movement (dark 
gray) and left/right-movement (light gray) at two listening positions, triangle indicates start and 
asterisk end of the trajectory 


166 7 Compact Spherical Loudspeaker Arrays 


4 r r r r 4 

2 2 
E E 
= S 
50 § 0 
O O 
o] fo] 
a a 
> -2} > 2 

-4 } ] -4 

-4 -2 o 2 4 -4 -2 o 2 4 
x position in m x position in m 


Fig. 7.14 Average perceived locations for each 500 ms step during circular movement of transient 
sound (dark gray) and stationary noise (light gray) without and with additional reflectors, triangle 
indicates start and asterisk end of the trajectory 


applied in the artistic study in [14] about body-space relations, composing sounds 
that are spatialized with different static directions and simple movements. 

For concerts, the artistic practice evolved to set the IKO up together with reflector 
baffles, cf. Fig. 7.14. A recent study in [30] investigated their effect on the perception 
of moving transient and stationary sounds. The baffles obviously reduce the signal- 
dependency by contributing more additional reflection paths, contrasting the direct 
sound. 


7.5 Practical Free-Software Examples 


7.5.1 IEM Room Encoder and Directivity Shaper 


The IEM Room Encoder VST plug-in, cf. Fig. 4.36, can not only be used to sim- 
ulate the room reflections of an omnidirectional sound source based on the image- 
source method, but it also supports directional sound sources. As format, it employs 
Ambisonics with ACN ordering and adjustable normalization up to seventh order. 
Thus, it enables to utilize data from directivity measurements or even directional 
recordings done with a surrounding spherical microphone, e.g. to put real instrument 
recordings into the virtual room. 

As an alternative, the IEM Directivity Shaper, cf. Fig.7.15 provides simple 
means to generate a frequency-dependent directivity pattern from scratch and to 
apply it on a mono input signal. This is useful to generate the typical rotary speaker 
effect of a Leslie cabinet. 
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Fig. 7.15 IEM Directivity Shaper plug-in 


7.5.2 IEM Cubes 5.1 Player and Surround with Depth 


As shown in Fig. 7.5, a pair of loudspeaker cubes can create a stable auditory event 
in between them to replace an actual center loudspeaker. In order to play back an 
entire 5.1 production, the IEM cubes 5.1 Player plug-in extends this approach by 
two additional beams to the side walls for the surround channels, cf. Fig. 7.16. The 
plug-in provides a control of the shape, direction, and level of all beams, as well as 
a delay compensation for the reflection paths. 

Surround sound with depth can be realized with a quadraphonic setup of four loud- 
speaker cubes and a combination of the cubes Surround Decoder and multiple 
Distance Encoder plug-ins, cf. Fig.7.16. For each source, the Distance Encoder 
controls position and distance, i.e. the blending between the two layers. The output of 
the plug-in is a 10-channel audio stream including 7 channels for third-order (inner 
layer) and 3 for first-order 2D Ambisonics (outer depth layer). The cubes Surround 
Decoder plug-in decodes the 10-channel audio stream and distributes the signals to 
the 16 drivers of four loudspeaker cubes. For each loudspeaker cube, the direc- 
tions to excite direct and reflected sound of the inner layer and the diffuse sound 
of the depth layer can be adjusted in order to adapt to the playback environment. 
Additionally, the directivity patterns for direct, reflected, and diffuse sound beams 
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Fig. 7.16 IEM cubes 5.1 Player, cubes Surround Decoder and Distance 
Encoder plug-ins 


can be controlled, as well as a delay to compensate for the longer propagation paths of 
the reflected sound. The plug-ins are available under 
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7.5.3 IKO 


Spatialization using the IKO can use a similar infrastructure of plug-ins as surround- 
ing loudspeaker arrays. Ambisonic encoder plug-ins, such as the ambix_encoder 
or the IEM stereoEncoder or MultiEncoder, create the third-order Ambisonic 
signals that are subsequently fed to a decoder. Decoding to the IKO requires the 
processing steps as shown in Fig. 7.7: radiation control filters in the spherical har- 
monic domain, decoding from spherical harmonics to transducer signals, as well as 
crosstalk cancellation and equalization of the transducers. This processing can be 
summarized in a 16 (spherical harmonics up to third order) x 20 (transducers) fil- 
ter matrix. Convolution can be done efficiently using the mc£x_convolver plug-in. 
Filter presets for the IKO can be found under http://phaidra.kug.ac.at/o:79235. 


References 


1. P. Boulez, L’acoustique et la musique contemporaine, in Proceedings of the 11th International 
Congress on Acoustics, vol. 8 (Paris, 1983), https://www.icacommission.org/Proceedings/ 
ICA1983Paris/ICA 1 1%20Proceedings%20Vol8.pdf 

2. M. Frank, G.K. Sharma, F. Zotter, What we already know about spatialization with com- 
pact spherical arrays as variable-directivity loudspeakers, in Proceedings of the inSONIC2015 
(Karlsruhe, 2015) 

3. O. Warusfel, P. Derogis, R. Causse, Radiation synthesis with digitally controlled loudspeakers, 
in 103rd AES Convention (Paris, 1997) 

4. P. Kassakian, D. Wessel, Characterization of spherical loudspeaker arrays, in Convention paper 
#1430 (San Francisco, 2004) 

5. R. Avizienis, A. Freed, P. Kassakian, D. Wessel, A compact 120 independent element spherical 
loudspeaker array with programmable radiation patterns, in 120th AES Convention (France, 
Paris, 2006) 

6. F. Zotter, R. Höldrich, Modeling radiation synthesis with spherical loudspeaker arrays, in 
Proceedings of the ICA (Madrid, 2007), http://iaem.at/Members/zotter 

7. F. Zotter, Analysis and synthesis of sound-radiation with spherical arrays, PhD. Thesis, Uni- 
versity of Music and Performing Arts, Graz (2009) 

8. H. Pomberger, Angular and radial directivity control for spherical loudspeaker arrays, M. 
Thesis, Institut fiir Elektronische Musik und Akustik, Kunstuni Graz, Technical University 
Graz, Graz, A (2008) 

9. M. Pollow, G.K. Behler, Variable directivity for platonic sound sources based on spherical 
harmonics optimization. Acta Acust. United Acust. 95(6), 1082-1092 (2009) 

10. A.M. Pasqual, P. Herzog, J.R. Arruda, Theoretical and experimental analysis of the behavior 
of a compact spherical loudspeaker array for directivity control. J. Acoust. Soc. Am. 128(6), 
3478-3488 (2010) 

11. A.Schmeder, An exploration of design parameters for human-interactive systems with compact 
spherical loudspeaker arrays, in /st Ambisonics Symposium (Graz, 2009) 

12. G.K. Sharma, F. Zotter, M. Frank, Orchestrating wall reflections in space by icosahedral loud- 
speaker: findings from first artistic research exploration, in Proceedings of the ICMC/SMC 
(Athens, 2014), pp. 830-835 

13. F. Zotter, M. Frank, A. Fuchs, D. Rudrich, Preliminary study on the perception of orientation- 
changing directional sound sources in rooms, in Proceedings of the Forum Acusticum (Kraków, 
2014) 


170 7 Compact Spherical Loudspeaker Arrays 


14. F. Wendt, G.K. Sharma, M. Frank, F. Zotter, R. Hdldrich, Perception of spatial sound phenomena 
created by the icosahedral loudspeaker. Comput. Music J. 41(1) (2017) 

15. F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflec- 
tions: the icosahedral loudspeaker. Comput. Music J. 41(3) (2017) 

16. F. Wendt, F. Zotter, M. Frank, R. Holdrich, Auditory distance control using a variable-directivity 
loudspeaker. MDPI Appl. Sci. 7(7) (2017) 

17. F. Wendt, M. Frank, On the localization of auditory objects created by directional sound sources 
in a virtual room, in VDT Tonmeistertagung (Cologne, 2018) 

18. F. Schultz, M. Zaunschirm, F. Zotter, Directivity and electro-acoustic measurements of the 
IKO, in Audio Engineering Society Convention 144 (2018), http://www.aes.org/e-lib/browse. 
cfm?elib=19557 

19. T. Deppisch, N. Meyer-Kahlen, F. Zotter, M. Frank, Surround with depth on first-order beam- 
controlling loudspeakers, in Audio Engineering Society Convention 144 (2018), http://www. 
aes.org/e-lib/browse.cfm?elib=19494 

20. M.-V. Laitinen, A. Politis, I. Huhtakallio, V. Pulkki, Controlling the perceived distance of 
an auditory object by manipulation of loudspeaker directivity. J. Acoust. Soc. Am. 137(6) 
EL462-EL468 (2015) 

21. F. Zotter, M. Frank, Investigation of auditory objects caused by directional sound sources 
in rooms. Acta Phys. Pol. A 128(1-A) (2015), http://przyrbwn.icm.edu.pl/APP/PDF/128/ 
al28zlap01.pdf 

22. F. Zagala, J. Linke, F. Zotter, M. Frank, Amplitude panning between beamforming-controlled 
direct and reflected sound, in Audio Engineering Society Convention 142 (Berlin, 2017), http:// 
www.aes.org/e-lib/browse.cfm?elib=18679 

23. N. Misdariis, F. Nicolas, O. Warusfel, R. Caussee, Radiation control on multi-loudspeaker 
device : La timée,” in International Computer Music Conference (La Habana, 2001) 

24. N. Meyer-Kahlen, F. Zotter, K. Pollack, Design and measurement of a first-order, in Audio Engi- 
neering Society Convention 144 (2018), http://www.aes.org/e-lib/browse.cfm?elib=19559 

25. F. Zotter, H. Pomberger, A. Schmeder, Efficient directivity pattern control for spherical loud- 
speaker arrays, in ACOUSTICSO8S (2008) 

26. F. Zotter, A. Schmeder, M. Noisternig, Crosstalk cancellation for spherical loudspeaker arrays, 
in Fortschritte der Akustik - DAGA (Dresden, 2008) 

27. S.P. Lipshitz, J. Vanderkoy, In-phase crossover network design. J. Audio Eng. Soc. 34(11), 
889-894 (1986) 

28. S. Losler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic 
recording, in Fortschritte der Akustik — DAGA Nürnberg (2015) 

29. J. Linke, F. Wendt, F. Zotter, M. Frank, How masking affects auditory objects of beamformed 
sounds, in Fortschritte der Akustik, DAGA (Munich, 2018) 

30. J. Linke, F. Wendt, F. Zotter, M. Frank, How the perception of moving sound beams is influenced 
by masking and reflector setup, in VDT Tonmeistertagung (Cologne, 2018) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Appendix 


Many have written of the experience of mathematical beauty as 
being comparable to that derived from the greatest art. 


S. Zeki, J.P. Romaya, D.M.T. Benincasa, M.F. Atiyah [1] 
Frontiers in Human Neuroscience, Feb. 2014. 


A.1 Harmonic Functions 


The Laplacian is defined in the D-dimensional Cartesian space as 


D 92 
A= —. 
2 ax? 

j=l Fl 
The Laplacian eigenproblem 


Af =—Af 


is solved by harmonics, on a finite interval. 


A.2 Laplacian in Orthogonal Coordinates 


In general, coordinates can be expressed by n-tuples of values. For instance, the 
Cartesian coordinates are (x1, x2, . . . ) and coordinates of another coordinate systems 
are (u1, U2, ...), and both describe the location of a point in a space depending on a 
finite number of dimensions. Each location of the space should be accessible by both 
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coordinate systems and there should be a bijective mapping between both systems, 
e.g. uj = Uj(X1, X2,...). A single differentiation with regard to the component x;, 
for instance, is described by the chain rule and consists of the sum of weighted partial 
differentials with regard to uj: 


ð ðuj ə 
— = —. A.l 
Ox; 2 Ox; OU; GD 
Written in terms of vectors, the (Cartesian) gradient V = — 2 with & z yields: 
_ du" ð ð 
= : “fos a A.2 
oe on 7" re 


for which the Jacobian matrix J sujas = [| that is either written in dependency 
$ ij 


of x or u represents all the partial derivatives of the mapping between the coordinate 
systems. For bijective mappings also Jacobian of the inverse mapping exists J x/;, = 


ðuj 
non- Tero IJ| £0. 
Orthogonal coordinate systems have the interesting property that the rows of the 
Jacobian (or its columns) are orthogonal, so that JTJ yields a diagonal matrix. 
With Josj = ox the meaning of this property becomes easier to understand: the 
differential changes in the location dx/du; of each Cartesian coordinate into the 
direction of each individual non-Cartesian coordinate u ; describes an orthogonal set 
of motion directions in space, whose orientation depends on the location and whose 
individual lengths may vary. 
To obtain a description of the Laplacian in the Helmholtz equation A = yo 


[ as _. The coordinate systems are equivalent if the determinant of the Jacobian is 


i Ox? 
is our goal here, and it can be obtained with the chain rule, now calculating from x; 


to uj, 


0 a a du; ə 
ie 2 Ox; Gr) = 3 Ox; 2 o Ou; ea) 


Da T o dug 3? 
ðu; Ti Ox; OX; Ou jdug’ 


du; dup 3? a ð du; \? 3? 
Wi DS oe, au aaa | C2) aataw) | ae) ot 
OX; OX; JU jIUx Se ðuT ou — \ Ox; Ou; 
ortho:diag RI 
with o denoting the element-wise, i.e. Hadamard product. Orthogonal coordinates 


largely simplify the Laplacian (see last line) and make it consist of first- and second- 
order differentials with regard to the new coordinates, individually, with all mixed 
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derivatives canceling. Both first- and second-order differentials are weighted by the 
partial derivatives of the coordinate mapping. For each u ;, the Laplacian is composed 
of those two expressions 


07u; ð 
A= > Au;, Where Au, = > 5x2 l P + l 
j i 


i 


ðu; | 2? 
D) l Bad (A.4) 


i 


A.3 Laplacian in Spherical Coordinates 


The right-handed spherical coordinate systems in ISO31-11, ISO80000-2, [2, 3], uses 
aradius r, an azimuth angle g, and a zenigh angle V, mapping to Cartesian coordinates 
x =r cosg sind, y =r sing sind, z =r cos #, or inversely r = yx? + y? + 22, 
g = arctan Z, v = arctan ~ uani , see Fig. 4.11. 

Re-expressing the zenith angle coordinate by ¢ = cos ¥ = £ reduces the effort in 
calculation and yields x = r cosy. /1—¢2,y =r sing /1—¢2,z=r¢. 

In order to obtain solutions along the angular dimensions azimuth and zenith, 
we first need to re-write the Laplacian from Cartesian to spherical coordinates. For 
first-order derivative along the x axis, we get the generalized differential 


ð or | ə dg| ə 0g] ə 
= + F ; 
Ox ox | or ax | dg ox | ət 
As the Cartesian and spherical coordinates are orthogonal, therefore any mixed 


second-order derivatives in Cartesian or spherical coordinates vanish. We may derive 
a second time wit. x: 


a fa r  (ar\* a |a 

ax | əx] | 3x2 \ax or |or 
[2o (aş a] a [a fpa a | a 
| ax2 " Lax) ap |ap | ax2° əx] ac | ac’ 


Obviously, we require all first-order derivatives squared, and all second-order deriva- 
tives of the spherical coordinates. 


A.3.1 The Radial Part 


With r = yx? + y? + z? we obtain for the radial part A, = [5 + (y è] of 
the Laplacian 
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[a] = [| [ee] EPH 


— 7r —, 

Ox 2 x2 + y24 22 r r2 
are gral 1 xl rx? 
əx? ðxLlrİ or ax aer F rr? r ` 


For x, y, z altogether, this is for: 


3r? -x See |e e y eee 2a 
E =a te EAEE A A) 
r or r r r? | or ror or 


2D. In two dimensions, there is no z coordinate, therefore there is just one term fewer: 


a a "E a a? la a? 


= ; A.6 
r3 or a (A.6) 


ər? rər ðr? 


Ar,2D = 
> r2 r2 


A.3.2 The Azimuthal Part 


1 
14+x?? 


the azimuthal part Ay = [xs + (y 2] 


With = arctan X and £ arctan x = E 
x dx ox 


5 of the Laplacian becomes 
Ki P: A aj i l 1 ə ] 2 | x? A _ l aj oy 
əx] ax E 1+3 dx x i De ce oa r 
Teee 
dy dy x? +y? dyx x? +y? x re, a 


arp f] y | 2xy 
ax? OX r| r?’ 


xy xy 
Fe ajx]  2xy dp öğ 
Oy ap ree R? azo o 


rcs 2xy — 2xy 5p x? + y? 3 1 3 1 a? 
K r3 ag ra dp? rag? r?(1 — 67) ag?” 


(A.7) 


2D. In two dimensions, ¢ = 0, therefore 
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1 3? 


en A. 
pe (A.8) 


Ag.2D = 


A.3.3 The Zenithal Part 


The zenith angle is actually 2, and we define ¢ = cos? as a variable to express 
it in order to simplify the derivation. With ¢ = ——+— = é, the zenithal part 


Jf x+y? z? 


ae = [BS + (EAE become 


ac e La e Ta 
ax} [əxr] L rri Lel 76’ 
2 
ət ae a zT 1 zz r-z al rey ee 
dz| | azr r rr| | r? fee] 76? 


ae ð XZ Z 1x 3x? — r? 
-[-B]--p ht 
əx? ax L r? r? 4 r5 
a7¢ = ð fey = ned Ea ST ay 
z= az | 3 rir r5 
For x, y, and z altogether, we get 
‘ 3x2 + 3y? 272-372, a HD+ 92 2r2 3 r*rgy a2 
Pa r5 0g t r6 ac OE T r6 aç? 
2 ð 2(1—¢7) a2 2.8 1-—¢7 3? 
=a aa e Po (A.9) 
r? ət r4 aç? r2” ðt r2 aç? 
2D. This part does not exist in 2D. 
A.3.4 Azimuthal Solution in 2D and 3D 
The azimuth harmonics are found by solving A^Aọ® = —Arz, p 
d? 2 
io = Ary ®. (A.10) 
We know that cos” x = — cos x and sin” x = — sin x, therefore we can insert the 


solutions 
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a  }cos(ay), fora => 0, 


sin(jalg), fora <0, 


ie Sok 2 4 A iles ; 
and obtain with ew ® = —a’® the characteristic equation that fixes a 
2 2 
—a = —À Tiy- 


Geometrically, we desire that (yg) = (y + 27 1) with l € Z. This is only possible 
with Arg, =m?,andm €Z. 

We can therefore define for —co < m < œ the terms of a normalized Fourier 
series 


V2sin(|m|o), form <0, 
On(y) = — į L for m = 0, (A.11) 
oh J2cos(mg), form > 0. 


The azimuth harmonics are orthogonal: none of the products cos(i@g) sin(j¢), 
cos(ig) cos(jg), or sin(ig) sin(jg) produces a constant component unless i = j, 
excluding the mixed cosine and sine product. Normalization ensures that the non- 
zero result is unity 


27 š 4 
1, fori = j, 
®; Č; dg = 6;; = A.12 
i aes I fi else, ( ) 
Qn 27 cos? m 2m sin? m 
by Io Fm =, ea dy = fo V ? dg =1. 


2D P With unbounded |m| —> ov, the circular harmonics are complete in 
the Hilbert space of square-integrable circular polynomials. By their orthonormality, 
we can derive a transformation integral of a function g (g) that should be represented 
as series 


[o0] 
= >> yðj (A.13) 
by integration of g over ®,, f dg 
f e(o) Pm dy = = 2 Ym D Pn); dy = Ym. (A.14) 
m m=—00 
sd 


=b nj 


2D panning functions. To decompose an infinitely narrow unit-surface Dirac delta 
function that represents an infinite-order panning function towards the direction gs, 
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lim,.o+, for |g—9,| <€, 
S(y — p) = aa AR ENE (A.15) 
0 otherwise, 
we the obtain the coefficients from the transformation integral 
m= so- &ndy = ongli | 
= Pn (ps). (A.16) 
Typically, a finite-order series will be employed as a panning function 
N 
Bn = >) dn On(G)Pn(@), (A.17) 


m>=— 


involving a weight am = am; controlling the side lobes. To evaluate the E measure 
of loudness, we can write 


Va 
yas (A.18) 


For 9; = 0, we obtain an axisymmetric function in terms of a pure cosine series, as 
sin0 = 0, 


” cos(m@), (A.19) 


N 2— 
gn(~) =Y am 
m=0 


with ôm = 1 for 1 for m = 0 and 0 elsewhere. The axisymmetric panning function 
is easier to design. 


2D max-rg. For the narrowest-possible spread, we maximize the length of rg, 


JE gx cospdy f sve, aja;(2 — 6;)(2 — 5;) cos(ig) cos( jp) cos(y) dg 


rE = T 55 
JZ gade (27)? E 
N A 
i=] 4iđi—1 TE 
EDS a = (A.20) 


cos[(i+g]+cos[(i—1)g] 


where we used cos(ig) cos(g) = 
m (2— ë) costip) cost P) Jy = —5. 


, inserted the orthogonality 


of the cosine f7, ij, and combined }_ (a;ai—-1 + aiai41) = 
2 X aj;a;_. To maximize, we zero the derivative to am 
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nj A 
z TE rE ee 1 Ar 1 r 
z pe lE E'rg| =0 


= An—1 + Am+1 — (2 — ôm)am rg = 0. 


rg = 


If we assume that a„ = cos(ma), and we insert this for +t% = am rg, we rec- 
cos[(m+1)a]+cos[(m—1)a] 
28 


ognize by inserting the above theorem cos(ma) cos(a) = 


cos[(m + 1)a] + cos[(m — 1)a] 
2 — Sin 


= cos(ma) rg = cos(ma) cos(a@) 


that rg = cosa. And to maximize rg by constraining that anı = 0, we get the 
smallest-possible spread a = +5 =e 20 


N+1 -- N+1 


cos (77), fr0<m<N, 
(Sua) 


am = N+l (A.21) 
0, elsewhere. 
The max-rg panning function in 2D consequently is 
N N 
NP) = D> amns) Pm(Y) = J cos (FR) Dm lOs) Pno). (A22) 
m=—N m=—N 


A.3.5 Towards Spherical Harmonics (3D) 


The spherical harmonics are harmonics depending only on angular terms. We may 
superimpose both parts Ag; = Ag + A; of the Laplacian and solve the eigenproblem 
r? Ag Y = —AÀY 


1 8 a a? 
Y—-2—Y+(1-¢? Y =-2Y. 


We assume Y to be a product of the azimuth harmonics ®,,(g) from above and 
undefined zenith harmonics ©(¢) 


Y = „O, (A.23) 


which yields a differential equation (0 —> d) only in ¢ after inserting fo On = 
-mn 


2 


d d 
© ©, —26®,—O + (1 — ¢7)®,, — O = -A®,, 0. 
PES ¢ T +(1-¢°) ag 


Appendix 179 


And after dividing by ®,,, we obtain the associated Legendre differential equation 


STG xterra ps Ts 20 
=g dg dg? 
[o 3) d 5p Say, ig Je = 6, (A.24) 
dé? dg l= ¢? 


A.3.6 Zenithal Solution: Associated Legendre Differential 
Equation 


The associated Legendre differential equation (written in x and y for mathematical 
simplicity) is 


n 1 m 
(1 — x?)y" — 2xy +f- a|» =0, 


or after gathering the derivatives 


2 
[a - xy] + [> — b =0. 


EEN 


1 


1—x2° 


Simplifying the differential equation by In the associated Legendre differential 


equation, we would like to get rid of the denominator “+ . In this case, it is typical 
to substitute y = (1 — x?)%v and try out which «œ succeeds. For insertion into the 


differential equation, the derivative of y is 


y = Sal Sah Oe v + (1 Se = 2a ae ae PL aa a, 
and the second-order derivative term is 


[0 —x7)y') = [-20(1 — x)? xv t s at VT 
= 4g? (1 — x3! x? y — 2a (1 He v — 2a (1 — x?) ev 
— (æ + 1) (1 — xxv + A xet y" 


4 2 
=(1 ar | ay 2a v 220+ Dav +A =v], 
=X 


Together with the term [A — is] y, the associated Legendre differential equation 
becomes 


do? 2 
(1 —x?)% Oxy 2a [A m v—22a+1)xv+(1—x?)v"| =0 
1 — x? 1— x? 
4a? 2 
Qe ‘m2 * \ i am 2," 
m io x v+ (À — 2æ)v —2(2æ + 1)xv +0 —-x*)v =0. 
We see that the term 1 entirely cancels by a = +, which fixes the substitution 


y=V1—x?'y. (A.25) 


Note that for rotational symmetric solutions around the Cartesian z coordinate, 
the choice of m = 0 would ensure a constant azimuthal part ®,, = const. Re- 
inserting x = ¢ = cos, the preceding term v 1 — cos? 3” = sin” ð is understand- 
ably required to represent shapes that aren’t rotationally symmetric around z, but 
any other, freely rotated axis, for which we also required the sinusoids in 2D. The 
differential equation for v = v(cos 7) is 


(1 —x?)v" —2(m+1)xv + [A—m(m + 1)]v = 0. (A.26) 


Still, the above equation is singular at x + 1, which means that the second- 
derivative term multiplied by (1 — x?) vanishes there, rendering the differential 
equation into a first-order differential equation, locally. Instead of the more com- 
prehensive Frobenius method we keep it simple: Desired spherical polynomials 


m—1 
: 1 Ox —0, Ox Pm (Ox .Ay) 
YP = Dy ON = Py (Bx By 0) With Da) a aw (8) (o) Pat 
aD y x x TWE 


imply that ©” must contain ,/1 — OZ Pam (02) to be polynomial and nth-order: in 
condensed notation this is y = /1 — x2” Yo ap x*, see also [4]. 


Power-series for v. With v = X` ;co ax x*, we get after inserting and deriving 


CO 0O 
(1 — x”) So kk = Vag xk-2 _ 20m + 1)x So kag xk-1 
k=2 k=1 
[0.6] 
+ [A -mmn + I] Y ag x* = 0, 
k=0 
[0,6] [0,6] [0,6] 
NO klk = 1) ag xt? = YO kk = 1) ag xt — 2(m+1) X kap x* 
k=2 k=2 k=1 
[0,6] 
+ [A —m(m + 1)] Se ag xe = 0. 
k=0 


For k > 2, all sum terms are present and the comparison of coefficients for the kth 
power yields: 
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(k + Dk +2) arz = [K(k — 1) + 2(m + Dk — [A — m(m + 1] a 
k(k+2m+1)+m(m+1)—-4 


a = ary. 
oe (k + Dk +2) j 


Typically for such a two-step recurrence, two starting conditions dg = 1, ay = 0 and 
ay = 0, a; = 1 yield a pair of linearly independent solutions (even and odd). 

If the series in x should converge, it will most certainly do so when v is polynomial 
and stops at some order. To design y to be of some arbitrary finite order n € Z, we 
take into account that VI — x?” is of mth order already, so the polynomial v must be 
(n — m)th order, and |m| < n. The series is forced to stop the coefficient a; for k = 
n — m if the numerator is forced to become zero by a suitably chosen A, thus A = (n — 
m)(n+m + 1)+m(m + 1) = n(n + 1). Corresponding to the termination either 
at an even or odd k = n — m, even dap = 1, a; = 0 or odd ap = 0, a; = 1 starting 
conditions must be chosen. The otherwise wrong-parity solution is an infinite series 
[5, Eq. 3.2.45] whose convergence radius R indicates singularities at x = +1, 


(k+ 1k +2) o oe 


R= lim = lim = lim ——— = 
k00 4g42 k> k(k+2m+1)—-m(m+1)—-n(n+1) kook? +... 


(A.27) 


Using à = n(n + 1) and writing the differentials in condensed form, the defining 
differential equations for associated Legendre functions P?” (m is no exponent but a 
second index) and their polynomial part v»? become 


d | 2 d "| | 
— | Ud —x*)— P| +a =0, (A.28) 
dx 


mim + 2] p” 
dx 


1 — x? 


a- x3” = [o a +[nmn+ 1) -mm +1)” =0. (A.29) 


Orthogonality of associated Legendre functions. The resulting associated Legendre 
differential equation 


, 2 
[o - [Pr] | + [ro +1)- rd Py’ = 0, 


yields a sequence of finite-order functions P?” with the order n € No and |m| < n. 
Before even defining these functions, we can prove their orthogonality fs PPR" 
dx = 0 for n Æ l. This means no product of a pair of associated Legendre functions 
of different indices n Æ l produces any constant part on x € [—1; 1], and P?” and P” 
do not contain shapes of the respective other function. This is important to uniquely 
decompose shapes and to define transformation integrals. We multiply the differential 
equation with P;” and integrate it over x 
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1 7 1 1 m2 
Í [a -tee ] Prax + f [nin + 1) a | etait ax = 0. 


1 1 


Integration by parts of the first integral yields 


/ 


1 
i, [o = (Pa P?” dx = (1—x?)[P™] Pi" 


1 


fons Pr) The" dx, 


(A.30) 


where the vanishing part is because of (1 — x”) = 0 at the endpoints x = +1 where 
[P| and P;” are finite. We get 


1 1 
f ( E x*)[ P™ | [Pr] dx = / | 
2, -1 


We could have arrived at an alternative expression, with the only difference in/(/ + 1) 
instead of n(n + 1), 


1 1 
| a-ere a= f [a+ 
—1 -1 


if we had started integrating the differential equation of P” over P?”, instead. The 
difference of both equations is 


2 
m 
-= z] P” Př" dx. 


n 


2 
m 
= =| Pi" P?” dx, 


1 
[n(n +1) -—1 + Dif p” PP dx = 0, 


and the scalar in brackets only vanishes for n = l. For the equation to hold at other 
n #1, we conclude that the associated Legendre functions ie P?” Pi"dx = 0 must 
be orthogonal. (Orthogonality needs not hold for different m, as ®,, achieves this 
orthogonality in azimuth.) 


Solving for polynomial part of associated Legendre functions. To solve the differ- 
ential equation for the polynomial part v? in a way to arrive at the elegant Rodrigues 
formula, we first play with a test function 

un, = (1 — x°)”, differentiated u’, = —2n x (1 — xô"! = —2n (1 — x°)! x uy. 


We may write its derivative as differential equation 


(1 — xul, + 2N X Un =0 
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and derive it / times by the Leibniz rule (f g)" = X} o (;) f® g™ for repeated 


differentiation of products, with the binomial coefficient (x) = mT Di and f = 
ff for simplicity. The few non-zero derivatives of x and (1 — x”) simplify differ- 
entiation, x’ = 1, [(1 — x? = —2x] = —2, 

(A = xuld 4.1(-2x) u + —— — ~—" (Yud 4 2nxu® + 2nlu& =Q, 


(1 — x’) a —2d—n)xu® +1Qn—141)u- =0. 
This equation matches (1 — x?) v’" — 2(m + 1) x v™ + [n(n + 1)—m(m+ 1)] 
v7 = 0 by matching the coefficients l — n =m + 1, hence / = m +n + 1, which 
nicely implies /(2n — L + 1) = n(n + 1)— m(m + 1), 


(=x?) ul") — (m + 1) xut + [n(n +1) — mim + 1] ule = 0. 


d” +m 


dxn+m (d = x" yielding y= = 


We therefore find the solutions v” =u?” = 
qiutm 
JT — x2" oom (1 — x*)". 


Rodrigues formula. By of the above, the Rodrigues formula for the associated Leg- 
endre functions P?” becomes 


B (G1 z" qrt” 


m 2\n 
Pa = ng] xi paa l =a) (A.31) 
m qd” —1)” q” 
or P= — ( "yl — x? Pr, with P; = ( ) (d = x3". 
dx” 2"n! dx” 


and P, = =p? are the Legendre polynomials. The Legendre polynomials are nor- 


malized to P,(1) = 1 by the factor EH, Because (1 — x?) is zero at x = 1 with 
any positive integer exponent, only the part of its n-fold derivative that exclu- 
sively affects the power of (1 — x7)” for n times is responsible for its value there: 
n!(—2x)" (1 — x7) |x=; = n!2"(—1)”. The scaling of the associated Legendre func- 


tions with m > 0 is somewhat more arbitrary in sign and value. 


Indies n and m. The boundaries for the index m of the Legendre functions m € Z are 
typically —n < m < m, however due to the shift of the eigenvalue by -~ 


a , functions 
for positive and negative m are linearly dependent. We observe this by inspecting the 
highest-order terms in [6] 


m qt” ie 2\n 
Mavil=g? P” = (1D (1 — x)” a=) 


dx”tm™ 


qrt” ! 
= xm x?” estes |e 2m (2n) nm sd, 
dx+m (n—my)! 


n—m 2)n 
onl — a2” pom = (tytn OD 


dx” 
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= m Gree 2n Pee See m (2n)! n+m ae 
= (-1) = |=! 1) E | 


(n— m)! 
(n+m)! 


=> Po" = (-1)” pr (A.32) 


n 


and to avoid confusion, it convenient to only use m > 0, or |m| to evaluate the 
associated Legendre functions. 


Alternative definition: three-term recurrence. Any polynomial P, of the order n 
can be decomposed into Legendre polynomials P, = )~/_, ci P;, and the Legendre 
polynomial P; is orthogonal to all those Legendre polynomials ies P, Pjdx = 0 if 
j > n. With this knowledge it is interesting to describe i (x P;) Pjdx. As (x P;) is 
of (i + 1)th order, the integral must vanish for j > i + 1. Because of commutativity, 
Ji P;(xP;)dx, and (xP;) being (j + 1)th order, it also vanishes for i > j + 1. 
Hereby, re-expansion of x P,, can maximally use three terms, x P, = œ Pa-1 + y Pa + 
B Pa+1. In fact only two terms remain as Pz% are even functions on x € [—1; 1] and 
P41 are odd, thus orthogonal. The product x P,, changes the parity of P,,, leaving 
xP, =a Py-1 + B Pay. At x = 1 all polynomials were normalized to P;(1) = 1, 
therefore evaluation at x = 1 leaves 1 =a + $, so œ = 1 — Bf, hence 


X Py = Bn Pasi + 1 — Bn) Prai 


As also the associated Legendre functions P?” for a specific m are orthogonal, the 
recurrence is more general 


x Py = Br Pari HOA = Br) Patt 


To determine the coefficient 6", we only need to find out how the highest-power coef- 
ficients x"~""*! of the polynomial parts in x P” and Pi"_, are related. We see this after 


l a M (__1)n+m qn+m Se oats M (__1)n+m 
inserting P, = /1 — x2 £ 2 eee (1 — x?)" and division by v1 — x? £ a ; 
which leaves a recurrence for the polynomial part 


p” 
m n m m m 
xv = v 2(n + 1)(1 — Ve as 
n 2(n = 1) n+l T Bn n—1 
O=n-m+1 y O=n—-m-1 
O=n-m+1 
Of the highest powers x"~”"*! in both xv” and v’ | the coefficients c” „_m and 
m 
Cn+1,n—m+1 define 
g” 
m _ n,n—m 
By = —2(a + 1)- ~. 
n+l,n—m+1 


To find it, we binomially expand (1 — x°)” to (1 — x9” = (—1)” Y% (— Dt (x) 
x207) 
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” L ege 


m n+m k 
E i Seea S (" On = OND" nom- 
(CDe darm &— \k k) (n—m— 2k)! 


k=0 


_ (-bytn! 2a)! (-1)"(2n)! : 
Aam = a AD A for the highest- 


Accordingly, coefficient of the recurrence is 


so that with k = 0 we can find c” 
power coefficient of v7”. 


n—-m+1 n—-m+1 
By =2(n+ 1) es : 
(2n + 1)(2n + 2) 2n+1 


we can construct 


hence with 1 — 8” = al and xP} = IZH pm + EE P i, 
P” recursively by | | 


2n+ 1 
aera ae! pen el 


à = n 2n n 
The start value is P? = cD V1 x2 4 x7)" = Cem /1— x2, and for 
n = m, the term pm , is excluded. 


(A.33) 


Normalization. A unity square integral (orthonormalization) simplifies the definition 
of transform integrals. We would like to obtain the corresponding factor N» with 


1 
l (P™N™)? dx = 1. 
=l 


Normalization for m = 0 is easy to find by repeated integration by parts 


1 
= Pnt | Pax = [a _ era a xJ” i 


oa a 
=0 


1 
-f [a — orl (a _ xy] ax 
= 
1 
=s D" f a-a -Pax 
=i 


1 bg 
= (1 — x)" 2n)! dx = on: f sin” © sind dv. 
—1 0 


* í H+ Mn+] 2a)! 22n nI2 2— 22n! _ 
ie the integral fy sint! 3 dd = 2 = 2g this is (N,,)~ = oh = 
z FT For N?', a trick to insert the relation between P” and P7™ is used [6], and 


integration by. parts until the differentials are of the same order 


1 1 1 —1)” (n — m)! 
D f P” p™ dx = / pm em) P7" dx 
(Nj)? -1 =j (n+m)! 
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1 (=1)” (n-m)! f! nam n—m 
~ Zap? (n+ m)! [le O Kes ds oe 
o @—m! 7? 1 2,7(n—m) ayin—m), __ 2 (n+m)! 
=o f al x^)] [a - x>] ne cla al 


=1/(Nn)? 


mi m |n +1) (n-m)! 
3 Np = (yn | am (A.34) 


The (—1)” can be excluded if not used in the Rodrigues formula. (It is always a wise 
idea to check and compare signs as conventions may differ ...in practice (—1)" is a 
rotation around z by 180°.) 


A.3.7 Spherical Harmonics 


With all the above definitions, we obtain the fully normalized spherical harmonics 


Yi (p, 0) =NI”! PI (cos 0) On (p) (A.35) 
Orthonormality. They are orthonormal when integrated over the sphere 


i YY" dcos dy = nn Smm- (A.36) 
S2 


Transform integral. Because of their completeness in the Hilbert space, any square- 
integrable function g(g, 3%) can be decomposed by 


89,9) = >> J. vam YP (Q, 9). (A.37) 


n=0 m'=—-n' 


From a known function g(g, 7), the coefficients are obtained by integrating g 
with another spherical harmonic Y” over the unit sphere $% S d cos 0 fig de. 
For a simple notation, we gather the two variables in a direction vector 0 = 
[coso sinv, sing sind, cos #]' and write 


oO n' 
0) Y” d0 = ww | YE Y” dO = Yam. A.38 
[sor 2 De, [wx y, (A.38) 
— 


n'=0 m'=—n' 


Snn’ mm! 


nn! mm 
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Parseval’s theorem. Due to orthonormality, the integral norm of any pattern g(6) 
composed as X? o X m——n Yam Y7” (0) is equivalent to 


oo n 
|g(@)/ da = lVaml? (A.39) 
J: 2 pa 
because Jg Yei Yam Vara Y (0) Y” (0) dé = Saam Yam Viim Onn! Omm! « 


3D panning functions: Dirac delta on the sphere. An infinitely narrow range around 
the desired direction @, can be described by limiting the dot product 0:0 > COSE > 
1. A unit-surface Dirac delta distribution 6(1 — 070) can be described as 


limo E> for arccos 010 < € 


5(1— 070) = = | (A.40) 


0, otherwise. 


And its coefficients are found by the transformation integral 

27 1 
Yam = f Y” (0) dé = res f dg lim dg = Y,"(0,). (A.41) 
s2 0 g 


>0 COS E 


Typically, a finite-order panning function with n < N employs a weight a, to reduce 
side lobes 


N n 
gn(9) =X J an YOYO). (A.42) 


n=0 m=—n 


Assuming the panning direction is 0, = [0, 0, 1]", we get the axisymmetric panning 


function, with Y? = ant P,, 
Ly 2ntl 
gn(v) = maa P, (cos ®). (A.43) 


We can evaluate its E measure by integrating g4 over the sphere 


27 1 i : 1 
2i+12j+1 
e= ay f arty E aa f P; P; de 
0 =] ij =i 
27 


— 


2 
bij 2i+1 


N 
2n+1 
=F MTE (A.44) 
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The rg measure is, because of the axisymmetry, perfectly aligned with z, therefore, 
its length is calculated by 


Podi i P, 
2j+1 Pitay Pj- 


1 i j 1 
_ Fr anQyede Dy aia; SaR CP A 
= - = F 
O Èi itl ajaj Ji Pi [G + D) Pj} + jPj-il]dë 
7 E 

In Qn An—1 + (n + 1) ay, an+1] S N An An-1 
_ _ (A.45) 
2E E 


3D max-r,. For the narrowest-possible spread, we maximize rg, which we decom- 
pose into rg = 7 and we zero its derivative, as for 2D, 


, ve Pp ro 1 a) , _ 
n> pE = Ble E’rg] =0 
Nant + (n + 1) ayy) — (2n + lan rg = 0. (A.46) 


If we assume that a, = P,(¢), we see by (n + 1) Pay) + nPa-1 = Qn + 1) Pat 
(n+ 1) Payı +n Pa-1 = 2n + 1) Pag = Pare (A.47) 


that rg = ¢ and a, = Pa (¢) = Pa (rg). We maximize rg under the constraint that 
Py+i(vg) = 0. Therefore, rg must be as close to | as possible, and be a zero of 
the Legendre polynomial Py+1. It can be discovered by a root-finding algorithm in 
MATLAB, e.g. Newton-Raphson, when the function P,, is implemented. In [7], the 


useful approximation rg = cos 24% = cos 2/2 was given. 


N+1.51 N+1.51 


Squared norm mirror/rotation invariance. The norm of any pattern a(@) is invariant 
under orthogonal coordinate transform (rotation/mirror) 6 = R 0 with R'R =1, 


/ doa = | b? (0) dé, with b(0) = a(R 0). (A.48) 
s? Ss? 


The norm equivalence of the corresponding spherical harmonics coefficients &nm and 
Bnm follows from Parseval’s theorem 


5 > lotnm |” = > y Pnn: (A.49) 


n=0 m=—n n=0 m=—n 


In vector notation of the coefficients, i.e. œ = [ao9,...,@nn]' and B = [oo, ..., 
Byn]", this is ||æl|? = || 6 ||*. To fulfill this equivalence, both vectors are related by an 
orthogonal matrix B = Q « with QTQ = I, andhence |b]? = B'B = a" Q7 Qa = 
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aTa = ||a||*. Moreover, rotation/mirroring neither creates components of higher nor 
lower orders, so that Q must be block structured 


0,0... 0 
Q=|00,0: |}. (A.50) 


The order subspaces in n therefore stay de-coupled, so that the coefficient vectors 
for every order-subspace n are related and norm equivalent under mirror/rotation 
operations 


læn? = 18,17, Ba = Qn On, QTQ, = Ioni- (A.51) 


Pseudo all-pass character of the Dirac delta. Dirac delta distributions 6(0'6, — 1) 
yield the coefficients Y;” (@,), and due to rotation invariance they yield constant energy 
in every spherical harmonic order n, regardless of the aiming 0,. One can determine 
the norm for zenithal aiming 3? = 0, i.e. 0, = 0, 0, 1', yielding a non-zero coefficient 
form = 0 


=] 


yri 
Y (02) =< ZE Pa) ôm = Z ôn. 


Because of the rotation invariance we recognize a pseudo-allpass character (Unsöld 
theorem) of the spherical harmonics of any order n 


DO MO)? = Y MOD? = ZE = Qn + 1) IROJ. 
For encoded single-direction Ambisonic signals &œnm (t), this implies 


Ye loam)? = 2a + D leo. (A.52) 


m=—-n 


Expected norm in the diffuse field. An ideal diffuse sound field is composed of 
directional signals a(0,,t) from all directions 0, with no correlation for signals 
from different directions E{a(0,, t) a(@2, t)} = o? 5(0; 02 — 1). Its coefficients are 
obtained by the integral over the directions 


ac f a(B,, t) ¥"(,) d0, (A.53) 
52 


and we can show that not only the expected directional signals, but also the spherical 
harmonic coefficients are orthogonal by {a (01, t) a(02, t)} = o? 80702 — 1) and 
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orthonormality ie yn y dô = Syn Smm Of the spherical harmonics 


Elem O axt) = f f Efa (01, t) a(02, t)} Y” (01) Y” (02) d0 1d02 
s2 Js2 
=a, 1 Y? (0) Y? (0) dO = 028 nn/Smm’- (A.54) 
s2 


In a perfectly diffuse field, we therefore expect the same norm in every spherical 
harmonic component per frequency band. We could reformulate this to 


Ellan (H3 = Ef|eroo(t)|7}, 


however the temporal disjointness assumption of SDM only invents a drastically 
thinned out content in the individual higher-order spherical harmonics. To cover the 
available temporal information from all (2n + 1) spherical harmonic signals within 
each order n and for a similar formulation as for a single-direction component, we 
may re-formulate 


XO Eflamm OP} = 2a + 1) Eflæo lt). (A.55) 


m=—-n 


Spherical convolution. By the argumentation used above to prove rotation invari- 
ance, we can argue that isotropic filtering of spherical patterns is invariant under 
rotation, and must therefore depend only on the order n. Spherical convolution of is 
defined in [8] by the coefficients Bam of a function b(@) convolved with the coeffi- 
cients a, of a rotationally symmetric shape a (0) = a(6,) 


Yam = Gn Bam- (A.56) 


Spherical cap function. A rotationally symmetric spherical cap function at +5 
centered around # = 0, briefly 0,, can be written in terms of a unit-step. We find the 
shape coefficients w, for its spherical harmonic decomposition by 


[0,6] n [0,6] 
u(cos & — cos 2) = X $ wa Y” (0) Y” (0) = Ý wn Pa(cos 0) Ztl, (A.57) 


n=0 m=—n n=0 


where Y7” (0,) = ,/ ZE PM (1) = ,/ %H ôm and P? = P, was used. The coefficients 


Wn are obtained by integration over Legendre polynomials d cos 3 Sii Pi (cos ®) 
and for the right hand-side using their orthogonality Si Px (¢) P, (¢) dt = seat onn'> 


leaving wn = zat , so that 


Appendix 191 


The integral is solved by 2”n! P, = me with v = (x? — 1) after replacing the inner- 
dv d 


most derivative by a =n =2k 4, and Leibniz’ rule for repeated derivatives 


dy de. a One) 


dx” a dx”! = dx”! 
(2 ) 0 d?łx d’-lyr-l A 1 dix d’-2yr-1 
= n SS ee Se 
n—1)dx® dx"! n—1) dx! dx”? 
d’-lyr-l d’-2yr-1 
= (2 1 
en) |e + o- 0 | 


We may increase n by one, observe that the last expression has one fewer differential, 


thus is an integrated version, and obtain after re-inserting 2”! P, = ov 


2" in + 1)! Papi = 2(n + 1) [zmr + n'n f P „dx — nc | 


Pati — xP, 
i; P,dx = ————_ +C (A.58) 


n 


With the definite integration limits xọ = cos 5 and 1 and Te P,,dx only depends on 
lower boundary as P,4;(1) — 1 - P,(1) = 0, 


1 
Pa = Pr 
i Pos +1(X%0) — xo Pn (x0) (A.59) 
Xo n 
P, Q <P, a 
mea AOR) 008 9 Fale 2) forn >0,  (A.60) 
n 


1 
cos % 


The recurrence (2n + 1)xP,—(n + 1)Pa+1 = nP,-1 yields alternatively for n>0 


and wọ = 27 dx = 2x (1 — cos 3) for n = 0. 


w o B (cos £) + P,—-1 (cos 2) -a Pen (cos %) — cos % P, (cos $) 
T n+l . 


(A.61) 


A.4 Encoding to SH and Decoding to SH 


Mode-matching decoder: L loudspeakers driven by the weights g; and given by their 
directions {0;} produce a pattern f (0) linearly composed of Dirac deltas 
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L 
£O) = 955670, — 1) g (A.62) 


l=1 


œo Žž n L 
=9_ > vO D2 Y, ODs =y0)Yg, 
l=1 


n=0 m=—n 


in an order-unlimited representation. The vector y = [Y/”(@)]nm contains all the 
spherical harmonics 0 < n < œ, —n < m < n, in a suitable order, e.g. Ambisonic 
Channel Number (ACN) n? + m + n, and the matrix Y = [y(0;)]; contains the spher- 
ical harmonic coefficient vectors of every loudspeaker. Obviously, the spherical har- 
monic coefficients synthesized by the loudspeakers are @ = Y g, so that 


f(0) = y0)'Y g = y@)'¢. 


With L loudspeakers, at most (N + 1)? < L spherical harmonics can be controlled. 
Therefore control typically restricts to the under-determined Nth-order subspace 


gx = Yng, 


in which we can synthesize any coefficient vector @y. To get a finite and well- 
determined solution with the exceeding and arbitrary degrees of freedom in g, the 
least-squares solution for g is searched under the constraint 


min ||g||? (A.63) 
subject to: y = Yn g, 


yielding the cost function with the Lagrange multipliers À 
J(g, à) = g'g + ($n — Yugy'a. 
For the optimum in g, its derivative to g is zero, and in À the corresponding derivative: 


aJ 
ag 


oJ 


= 2gop — YNÀ =0, z TIn Yng =O. 


For g the equation yields gp = 5 Y N à, and for A the original constraint øy = Yn g 
that only allows to insert the optimal g yielding y = Yn G Yy Aopt)- Inversion of 
(5YnYx)! from the left yields the multipliers (YNY) Éy = Aopr, So that 


g =Y YNY) ‘on. (A.64) 


The solution is right-inverse to Yy, i.e. Yy [Ye YNYD] =y 
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Best-fit encoder by MMSE: When given M samples of a spherical function g(@) at 
the locations {@;}, we can minimize the means-square error (MMSE) 


L N n 2 
l=1 


n=0 m=—n 


to find suitable spherical harmonic coefficients ynm. Using the matrix notation from 
above, this is 


min |je||* = min lg — YX ynl? (A.65) 


and we find by zeroing the derivative 


a de \" 
eee? (5) e = 2Yne = 2YnYbyy — 2Yng = 0, 
IYN OYN 
=> yy = (Yun) Yn. (A.66) 


The resulting matrix is left-inverse to the thin matrix YT, and can be written in terms 
of the more general pseudo inverse (YẸ). 


A.5 Covariance Constraint for Binaural Ambisonic 
Decoding 


The interaural covariance matrix is related to the expectation value of the auto/cross- 
covariances of the left and right HRTFs: 


Os. hier (0, w) * 5 * = H 
R =f e [hiet (0, «)* hrign (8, @)*] ao = | nO oh, o) dð. 
(A.67) 


When specified in terms of spherical harmonic coefficients h = y"hgy, the inte- 
gral Ie hth2d@ of any of R’s entries vanishes by the orthogonality of the spherical 
harmonics hon fe yy” d0 hsm = hyh sH2, and we obviously only need the inner 
product between the spherical-harmonic coefficients of the HRTFs. 

A very-high-order spherical harmonics HRTF dataset H Sii of dimensions 2 x 
(M + 1)? with the order (M >> N) yields a covariance matrix at every frequency 


R = H}; Hsy = X"X 


that can be factored into a quadratic form of a2 x 2 matrix X by Cholesky factoriza- 
tion, which reduces the degrees of freedom involved to the minimum required size. 
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The Nth-order Ambisonically reproduced, high-frequency modified HRTF dataset 
H sy of dimensions 2 x (M + 1)? also has a 2 x 2 covariance matrix R that will 
differ from R, and which we also decompose in Cholesky factors Xx, 


È. (A.68) 


To equalize R = R, the reproduced HRTF set is corrected by a 2 x 2 filter matrix 
M, 


Asx com = ĤsyM. (A.69) 


This is done properly as soon as 


I 


Hy _ aH olga, athe! puny s 
X X=MX XM=MX Q QXM, (A.70) 


and the orthogonal matrix Q is used to compensate for degrees of freedom that the 
Cholesky factors X and X have in sign, phase, mixing, with regard to each other. We 
recognize the root and hereby the preliminary solution for M 


X = QÊM, => M = ` QĦ"Xx. (A.71) 


This leaves H SH.corr = H HX = Q"X depending on an unspecific orthogonal 2 x 
2 matrix Q. To obtain a corrected-covariance HRTFs va corr.sH Of highest-possible 
phase-alignment and correlation to its uncorrected counterpart H sH, We maximize 
the trace, i.e. the sum of diagonal elements 


max ATÂ corr,SH} = Max ReTr{Ho Asu ‘gH X}= 
max ReTr( XX "oHx) = max ReTr{X "O"xX) = max ReTr{ our” X}. 


For the last expression, the property Tr{A B} = Tr{BA} was used. An orthogonal 
matrix Q" = VU" composed of two orthogonal matrices U and V would yield 


THX XVU} = Tr(UĦ Ê" XV}, and it would maximize the trace if U and V” 
diagonalized xX "y . This is accomplished by singular-value decomposition (SVD) 


x "y = USV", when singular values S$ = diag{[s1, 52]} are real and positive, as in 
most SVD implementations. Using U and V, the desired solution is: 


^ ^ a-l 
H corr,SH = Hsy X VU®X. (A.72) 
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Ifthe SVD delivers negative or complex-valued singular values, the complex/negative 
factor just need to be pulled out and factored into either the corresponding left or 
right singular vector. 


A.6 Physics of the Helmholtz Equation 


A.6.1 Adiabatic Compression 


We search for a physical compression equation relating pressure p and volume V. 


Ideal gas. The gas pressure p inside the volume V obeys the ideal gas law [9] 
pV=nRT, (A.73) 


with n measuring the amount of substance in moles, R is the gas constant, and T 
is the temperature. This would yield a valid compression equation if the medium 
of sound propagation was isothermal. However, this is not the case T Æ const, and 
local temperature fluctuations happen too fast to be equalized by thermal dissipation. 
Isothermal compression would be too soft, the resulting speed of sound off by —15%. 
A compression law involving fluctuations of all three quantities (p, V, T) needs an 
additional equation. 


First law of thermodynamics. In thermodynamics [10-12], the enthalpy H describes 
the energy required to heat up a freely expanding gas under constant pressure p. The 
enthalpy goes to the internal energy U required to heat up the gas in a constant 
volume, which is easier, plus the ideal-gas volume work p V taken by the gas to 
expand under the constant external pressure 


H=U+pV, (A.74) 
specifically ncp T =ncyT +n RT, => R=c¢p- cy. 


The quantities c, and cy are the specific heat capacities for heating up a gas that is 
expanding (p = const.) or confined in a fixed volume (V = const.) to a temperature 
T, which can be accurately measured or modeled. Obviously, the gas constant R is 
the difference between the two. To make sound propagation isenthalpic, the energy 
must fluctuate between internal energy U and volume work pV. 


Adiabatic process. The above steady-state equations are not useful yet to describe 
short-term fluctuations of p, V, and T in time and space. A differential formulation 
related to the change in enthalpy, internal energy, and volume work dH = dU + 
p dV is more useful. Moreover, we regard packages of a constant amount of substance 
whose internal heat up is just due to compression and not due to external enthalpy 
sources, it is therefore isenthalpic dH = 0, see [10, Sect. 3.12.2], [13] 
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nRT 
dV. 


0=ncydT + 


We may divide by n cy T, replace R = cp — cy, and obtain $7 T4 E -1)%¥ =0 
whose integration yields In THÈ — 1) InV=In(T Va ) = >e TVN! = 


1, and with the ideal gas equation inserted as T = 25 u , the adiabatic process law 
becomes 

pVw =n R = const, (A.75) 
for which the adiabatic exponent is frequently expressed as y = —. For air, the expo- 


nentis y = 1.4, and we may express a state change as (po, Vo) Ei ‘(Po +p, V+ V). 
The equation po VÝ = (po + p)(Vo + V)” yields after division by po Vj and by 
(ETSI 


p ve V V 
1+^4=(1+>) wiay—, h =— 
+ ( + ) Vy gase ta Le 


Assuming the Cartesian coordinates x, y, z measured in the resting gas to define its 
volume Vp = AxAyAz, as well as its deflected coordinates &(x), n(y), ¢(z) after 
a volume change to V, we can approximate the volume change well-enough by the 
three independent volume changes AE AyAz, Ax AnAz, and Ax AyAZ, resulting 
from the superimposed individual elongation into the three coordinates’ directions, 


. AV . A€AyAz+ AxAnAz + AxAyAC dE 4 on at 
im = = 
vV—>0 Vo Vo>0 AxAyAZz ox dy az 


Replacing the bulk modulus K = ypo = p c? of air by more common constants,! 
where c = ./X/p and applying the derivative in time 2, we get the equation of 


compression in its typical form using the velocities a =y, 


Op 


= —p e V'». (A.76) 


A.6.2 Potential and Kinetic Sound Energies, Intensity, 
Diffuseness 


The potential energy density or volume work stored in the elastic medium that gets 
compressed by a deformation dV increases with dw, = pdV, while deformation 
also increases the pressure by dp = K dV. We may substitute for dV = K~'dp 


'Typical constants are y = 1.4, pọ = 10° Pa, p = 1.2kg/m3, c = 343 m/s. 
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yielding dw, = K~! p dp. The volume work stored by a pressure increase from 0 to 
pis 


i = [Po = PP AT 
ES a K 2K 2pc? i 


The kinetic energy density stored in the motion of the medium along any axis, e.g. x, 
increases by acceleration against its mass, dwy, = p os dx. The velocity is vx = te 
so that we substitute for dx = vxdt to get dwy, = p vx dvx. The total kinetic energy 


density stored in velocities increasing from 0 to vx, vy, vz is 


Vx Vy vz v2 + v2 + v2 y 2 
w= f nant f wawt f van |= p2 y z _ pl l ’ 
0 0 0 2 2 


(A.78) 


Total energy density and intensity. The total energy density therefore becomes 


2 2 
p piv 
= y= , A.79 
w=wp +w De 5 ( ) 
and derived with regard to time, it becomes 
ð ð ð 
YPP oT l pV +Y p = Vp) = V'I, (A.80) 


a pee O u 


and defines the (time-domain) intensity vector T = pv that describes the energy 
flow in space. Hereby, ow = V'T expresses that only a non-zero divergence of the 
intensity causes energy increase (source) or loss (absorption) in the lossless medium. 


Direction of arrival and diffuseness: The intensity vector carries a meaning in its 
own right: it displays into which direction the energy flows (direction of emission). 
In the frequency domain, it becomes I = Re{p*v}, and for a plane-wave sound field 


p= ek O.7 where v = — we =— 7s A , it indicates the direction of arrival (DOA) 


I R * 26, 
rDOA = RE 7 = pe tip ui = we IPI 7 = 6.. (A.81) 
IP| |pl pc |p| 


An ideal, uniformly enveloping diffuse field is composed of uncorrelated plane waves 
E{ a(@uy" a2) = a 6(1 0702) resulting in the sound pressure p = f > ekr dg. 


While the expected sound pressure is non-zero as before E{|p|?} = ua = |p|’, 
the expected intensity of the uniformly surrounding waves vanishes —pc E{I} = 
2 
i Sz 0, dð, = 0. 
Assuming stochastic interference of all sources, the intensity-based DOA estima- 


_ pcRe{p*v 


torrpoa = PE } is therefore the physical equivalent to the rg vector measure. 


198 Appendix 


A typical diffuseness measure 0 < yw < 1 relies on its length between 0 and 1 
y =1—IIrpoall’. (A.82) 


The signals W = p and [X, Y, Z/' = sexe = —pc »/2v of a first-order Ambisonic 
microphone allow to describe a time-domain estimator rpoa 
pcE{I} _— pcE{py} _ E(WIX,Y, ZI") 


E{p} Elp} DEW} 


TDOA = (A.83) 


A.6.3 Green’s Function in 3 Cartesian Dimensions 


We may compose the Green’s function, the solution to the inhomogeneous wave 
equation 


(*- 350) 
A G = —ô(t)ô(r), 


c? ðt? 


from products of complex exponentials with regard to time and the Cartesian direc- 
tions 


y . ; ; T. G 
ele ttiks x+iky y+ik, z = elk r eit (A.84) 


where the position x, y, z was gathered in a position vector r, and the wave numbers 
kx, ky, kz of the individual coordinates were gathered in a wave-number vector k. 


From this solution, we compose the Green’s function by superimposing all spatial and 
temporal complex exponentials in k and w, weighted by an unknown coefficient y: 


G= If y dK iO! dey dk. (A.85) 


aT oT oT ae ; 
Because of Aet" = (—k? — k? — k?) ek r — kek" and ae =- e, 
insertion into the inhomogeneous wave equation yields 


— If y [e = | dK" plot dodk = —8(t) 8(r). 


eatin: 25 
Multiple transformations e~'* "e~i@ dr dt Jf remove the integrals by orthogonality 


š > z x DA Sa 
= Sf y (=F) y irar] [f ee dr] deo dk = = f 80e dr f lre? rar, 
TEGE ee 
(27)35(k—k) 216(w@—0) 1 1 
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and the unknown coefficient remains y = . Letting G in the frequency 


oe eae ae 
(27x) H 2 
domain, one = 


z in y and the integral eda f are omitted. We transform y back 
via k 


B 1 ff eik'r I< e"? 
“GJ p-a * = oa eas 


By re-expressing k” r = kr 670, = kr cos Ŷ = kré we simplify the integral. Now 
we already see formally that Green’s function can only depend on the distance r 
between source and receiver location G = G(w,r). 

The book [14, S.110-112] shows a notably compact derivation, which we will use 
below. 


Derivation for by transforming back from the Fourier domain: For three dimen- 
sions, the transformation back from the Fourier domain is relatively easy to accom- 
plish. Before going into details, we recognize that the substitution of k" r by kr cos 0 
contains the radius of the wave vector k = ||k|| and the cosine of the angle between 
r and k. Ink space, we can always define a correspondingly oriented coordinate sys- 
tem for any r as to simplify the integral fff% dk = [5° = fo K dk dọ dcos ® = 


i r i k? dk dọ dé. After re-arranging the integrals, we get 


eik"? d 1 œ ] ikr _ p—ikr 
a fe L “p= zf — SE Pak 
ay A (27x) Jo ikr k2— s 
1 1 fo) eikr = ew ikr 1 1 fore) eikr k fore) ew ikr k 
=u z a kek = soe k se dk 
0 k-32% Qa) ir | Jo k- & 0 es 
1 1 a e k m i eOr (—k) d D 
— QaPir|Jo kR- 2 0 (k2 -2 
C C 


1 1 co e k 0 e k 
=a: Í “Ea f sr k 
Qr) ir | Jo k > œ k 5 

c c 


1 1 o0 ek! 
~ (x)? ir Í. pie k dk. (A.86) 


The denominator is expanded in partial fractions p age: z z + + ; +3 


ee 


1 1 [oe e" CO eikr 
= dk dk}. A.87 
2m)? ir i iG l a 


c e 


z, yielding 


To obtain causal temporal solutions, there needs to be a specific solution of the 
improper and singular integrals. 
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Fig. A.1 Closure of 
improper integration path 
using C4 over the positive 
imaginary half plane for 

t > 0, and C_ fort < 0, and 
regularization of pole for 
causal result, i.e. non-zero R{w} 
only for C+ 


Soy l 
e luio 
convergent for t < 0 


= etot 


Causal solutions (integral in œ). Causal responses obeying the transformation 


h(t) = f Ta ae (A.88) 


oo Oa 


. . . . CO . . 

are obtained by replacing the improper integral Jo a by a closed integration contour, 
and by introducing vanishing regularization. Jordan’s lemma states that improper 
integration T is equivalent to a closed integration path C4 of positive orientation 
involving the additional semi-circle on the upper half of the complex number plane 
an= fe, = limg> o| SE dw + fy Rdg] if the integrand of the semi-circle van- 
` 7 7 i cosg—sing) Rt . . pried b 
ishes, i.e. lim Ro — = 0. This is the case for positive times t > 0. For 
negative times, the integral can be closed using the lower part of the complex num- 
ber plane, JS Sipa = limg>o[ SE daw + i R dg], if the semi-circular integral 
vanishes, which is true for negative times t < 0 in our case, see Fig. A.1. We get 


“do, ift > 0, 
h(t) = ine (A.89) 
o do, ift <0. 


According to Cauchy’s integral formula for analytic regular functions f(z) over 
a single pole x, we obtain 


f(z) dz = +2xi f(a), ifthe path C+ surrounds a, 


m l (A.90) 
cœ z-a 0, if a lies outside the path Cx. 


If a pole on the real axis a € R is slightly shifted by a vanishing imaginary 
amount to lim,_.o+ bc, Se dw (regularization) so that it lies within the path 
C, and not in C_, the result is perfectly causal and vanishes at negative times: 
h(t) = 27i lim,_,9+ e11! u(t), with the unit step function u(t) = 1 for t > 0 and 


0 for t < 0, see Fig. A.1. 
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Fig. A.2 Causality- ; 3 { k} A —oor 
enforcing regularization 

ow = limg_,o+ æ — ie of the 
poles in Green’s function 
wrt. wave number k. For a C4 
positive radius r > 0, closure E >t 0 
of the improper integration x on R{ k } 
path requires C+ with < a ae, > 
semicircle on the positive ner Fle 4 ah / +2 —ie 
imaginary half plane (or C— 

forr < 0) 


ikr — 
e lk>io =e 
convergent 


Integral in k. Causality requires specific regularization in frequency as shown above: 
Replacement of w by lim,_,9+ œ — ie c guarantees causality in the partial-fraction 
expanded Green’s function Eq. (A.87). Jordan’s lemma requires to use the path C4 
to close the improper path in k for a positive radius r > 0, cf. Fig. A.2, 


+f eikr dk B Arie tk B eve! 
co k+2—ie]  2Qm)ir  4nr' 


(A.91) 


G 1 i 1 / ess 
= im 
2(27)? e>0* Ir = e + ie 


A.6.4 Radial Solution of the Helmholtz Equation 


The radial part of the Helmholtz equation in spherical coordinates is characterized 
by the spherical Bessel differential equation in x = kr 


y! +2x71 y+ [1—n(nt+ 1)x77]y = 0. (A.92) 


Recursive construction. For n = 0, we know that the omnidirectional Green’s func- 
tion is a solution diverging from x = 0, and it is proportional to y « Z, We can sim- 
plify the equation by inserting y = x7! u„, which yields with y’ = x7! u’, — x7? un 
and y” = x7! u! — 2x7? ul, + 2x~3u, after multiplying with x: 


u! —2x7lu, + 2x7 u, +2x7lul —2x7u, +[1— n(n + 1x 7]u, = 0 
u! + [1 — n(n + 1)x™’]un = 0. 
(A.93) 


Moreover, we attempt to find a recursive definition for n > 0 using the approach 


—1 af ..—a 4 
Yn = Xun, lig = =x [x un]. 
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We evaluate the recursion for the derivatives 


-1 
TAX Un-l, 


tax lu -ax uni, with—u"_, =[1— n(n — 1) x7? Jy 
=ax lul; +{1— [n(n — 1) +a] x7?} un- 

u! = axa tu", —ax77ul_ +{1 — [n(n — 1) + alx’ yul; + [n(n — 1) + alx uni 

U — [n(n — 1) + 2a]x7’}u!_ + {2an — 1) + 2a + an(n — 1)]x 7? — ax! Jun- 


= {1 — [n(n — 1) + 2a] x7’ yul + (In — D(a +2) + 2a] x73 — a x7} uni 


The equation u” + [1 — n(n + 1)x~]u, = 0 using the above expressions becomes 


{1 = [n(n — 1) + 2a] x7} u! + {a(n — 1)(a +2) + 2a] x — ax uni 
+1 —n(nt Dx, ax! un] = 0. 


Comparing coefficients for u’, _; and u,_, yields a = n 


u: 1—1 — [n(n — 1) + 2a — n(n + 1)]x? = 2(a —n)x~? = 0, 
Un—1 : [—a +aļx™! + [n(n — 1)(a + 2) + 2a — an(n + 1)]x™? = 0 
an(n — 1) + 2n(n — 1) + 2a(1 — n) — an(n — 1) = 2(n — a) = 0, 
and hereby a recurrence for y, from u, = —x" [x "un—1]" with y, = x Un, Un = 
X Yn, 


Yn = x TE yya] S ynl = —2" Le yn]. (A.94) 


Singular and regular solution. We know from the Green’s function that the omni- 
directional solution should be proportional to gọ œ e~. The typical radial solution 
for an omnidirectional source field is chosen to be the spherical Hankel function of 
the second kind” 


—ikr 
(2) g (2) n 
hy (kr) =, Ing (kr) = — (kr) 


1 
d(kr) lay 


h® «| .  (A.95) 


However, this solution is not sufficient to solve problems without singularity atr = 0. 
We know that the function sintir) ) is finite at kr, and so are all real parts of the spherical 


Hankel functions of the second kind, the spherical Bessel functions 


sin(kr) 


Jo(kr) = ie 


d 1 
> ng (kr) = oy a lar Jn a) -  (A.96) 


Note that some scholars use the Fourier expansion e" with opposite sign e~! and require to use 
the complex conjugate in every expression containing imaginary constants hP = hO., 
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The solutions are linearly independent. One check after some calculation that their 
Wronski determinant is non-zero [15, Eq. 10.50.1] 


in(kr) hi (kr) 


ee) WO) g| = HENKOK — I EDP (kr) = 


(A.97) 


~ (kr? 


Below, the Frobenius method is shown as alternative way to get these functions. 


Alternative way: Frobenius method. Given a second-order differential equation with 
singular coefficients, it can be solved by a generalized infinite power series: 


[0,6] [0,6] [0,6] 
y” + (>: ax) x! y + (>: bi ”) x *y=0, solution: y = > cp xY. 
1=0 1=0 k=0 
(A.98) 


Insertion of the solution yields 


[0,0] (oe) 


[e6] 
Wk +r —Dk+ yc tL [K + y) a + bile x* 47 = 0, 
k=0 ‘=0 /=0 


an index shift k’ + l = k,and/ = 0...k allows to pull out the common factor xkty—2 


9 k 
Pfa Hy = 1) (k+y)ck Dw: l+y)a bijet = 
; 1=0 


lo) k 
Y{l« ty +a = 1) (k +y) + bolck DDG l+y)a bilaan, 


The coefficient of every exponent of x in the above equation must be zero: 


indical equation for k = 0: [o +a — 1)y + bo |co = 0, 
(A.99) 
indical equation for k = 1: [v +ao)(y + 1) + bole + [a yt bı |co = 0, 
(A.100) 
Wail -1+ y) a + bi]er 
recurrence for k > 1: = Che 
(K+ y+a9—-1)&+y) +0 
(A.101) 


Depending on the specific values found for y, the recurrence, etc. the Frobenius 
method suggests how to find or construct an independent pair of solutions. 


Spherical Bessel differential equation. In y” + 2x~'y' + [—n(n + 1) + x?]x7?y = 
0, all a; and b; are zero except ay = 2, bo = —n(n + 1), and b2 = 1. Indical equations 
and recurrence become 
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[viv +) -nn + 1)] co =9, (A.102) 
[iv + Diy +2 -nan +1]  =0, (A.103) 
(k +y + 1k +y) ce = —cr-2- (A.104) 


We see that the recurrence is again a two-step recurrence, so that one can choose 
between an even solution using co Æ 0, cı = 0 yielding y =n or y = —(n + 1), 
[or an odd solution that won’t be used, with co = 0, cı Æ 0 yielding y + 1 = n or 
y+1=-(n+ 1). 


Spherical Bessel functions. The choice y = n yields a solution converging every- 
-2 E 

(n+ EDG tk) — (n+k A 

yield a convergence radius R = limg-+oo | s| = limzġ>sæ(n + k + 1)(n + k) = 


where: Powers of x are all positive, the recurrences c} = 


With a starting value co = a 7D solutions are called spherical Bessel functions ic 
Chap. 3.4] 
[e0] 
—1)* k)! 
=F ARE ae (A.105) 
2 k(n +b) + 1! 


which are a physical set of regular solutions with n-fold zero at 0. The spherical 
Bessel function for n = 0 is 


sin x X 


Jo(x) = (A.106) 
x 


With the above recursive definition iterated [5, Eq. 3.4.15] one could define 


A sarl, rn(id í 
Jai 5x | — Jn J, Jn = (—X)"| ~— ) Jo. (A.107) 
dx \ x” x dx 


Spherical Neumann SICON: y= —(n + 1) and co Æ 0, cy = 0, the recur- 
rences are cg = aa Z =D and yield the spherical Neumann functions 
(n R ti EFD! 


with an (n + 1)-fold nae at 0. They also obey the recursive definition from above, 


cos x „f1 aV 
y= ; Yn = (—x) G5) yo. (A.108) 
x d 


Spherical Hankel functions. The spherical Neumann and Bessel functions based on 
either sin or cos are clearly linearly independent. The spherical Bessel functions are 
useful to representing fields convergent everywhere. Physical source fields (Green’s 
function) diverge at the source location and exhibit a specific way phase radiates, with 
Ga Z, The spherical Bessel and Neumann functions are asymptotically similar 
to [15, Eq. 10.52.3] 
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lim jn =(= yen, im pape 
x—> 0 Xx 


(A.109) 

therefore only their combination to spherical Hankel functions of the second kind 
h® = jn —iyn (A.110) 

yields useful physical set of singular solutions. They inherit their (n + 1)-fold pole 


from the spherical Neumann functions at 0. Their limiting form for large arguments 
are 


d/1 
lim A@ (x) = =x"! lim — ( h? ) 
r>ow 


-1 1 d d 
=—x"! lim (-" no + —h® ) = lim —h®, 
X 


r= o0 n n=l n=l dy "7! roo dx 
d” 
= (-1)" lim —A® = i" hP? Q). (A.111) 
r= dx” 


With Eqs. (A. jel (A.106) and (A.108), the zeroth-order spherical Hankel function 
ish (x) = = 


Alternative saan by cylindrical functions. We can transform the spherical 
Bessel differential equation by inserting y = x“ u and obtain after division by x“ 


xu" +2ax% lu +a(a—Du +2x% ly! + 2ax%7u + [1—n(n+ 1)x-7Ju =0 


u" 4 goa H eiD eiD], o, 
x x 
Fora = —5 L the equation for u becomes the Bessel differential equation with æ (æ + 
aes ee tn+j)=-(@+ 5) 
1 (n+3)° 
u” + cet 1- —,— |u =0. (A.112) 
x 


Consequently, the spherical Bessel functions and spherical Hankel functions of the 
second kind can be implemented using the Bessel and Hankel functions that can be 
found in any standard maths programming library. The specific relations are: 


l ml D m1 1Q) 
Jh) = ax Ing i), hy x) = DF z Hng 0). (A.113) 
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A.6.5 Green’s Function in Spherical Solutions, Angular 
Distributions, Plane Waves 


We can write the inhomogeneous Helmholtz equation (A + k*)G = —6 to be excited 
by a source at the direction ĝo at the radius ro. We decompose the excitation into 
a Delta function in radius and direction —rg S(r — ro)ô (030 — 1). The directional 
part needs not be restricted to the spherical Dirac delta function, so we can take a 
distribution of sources at ro, weighted by the panning function g (0), 


(A +k?) p = —1578(r — ro) (0). (A.114) 


From the spherical basis solutions, we know that at a radius other than rọ, p can be 
expanded into spherical harmonics 


Le nm Y3 (0). (A.115) 


Acting on the decomposition of p, the directional part of the Laplacian will yield the 
eigenvalue Ar Yr = —n(n + 1) r~? Y™ of the spherical harmonics, and its radial 


part r?A, = at aro) 2 , as around Eq. (6.11), hence 


(A + k?) > 5 Wam Ye (0) = 


n=0 m=—n 


2< ATZ 2% +1 7 
y | pZ yp m | vam YEO =e — ro) 0. 


Ər? rðr 


Obviously, Yanm must depend on k and r, so we may pull the factor k into the differen- 
tials £ = k-¢ to get the differential operator k°[ aoe +2 Te +1 nee | and 
observe kr as its variable on the left, and we replace kr by x for brevity. Applying 
the factor k~? and the spherical harmonics transform eY a (9) d0 on the equation 
removes the double sum on the left (orthogonality) and decomposes the panning 


function g(@) on the right into yp», 


d 2d n(n +1) E 
| T +1 | Wam = — (kro) *8(r Bi ro) Ynm- 


dx? x dx x 


We collect the x-independent term y,,, as factors of the solution Yanm = Y Yom and 
get 


x9 ô(r — ro), (A.116) 


2 


yn. 2 n(n +1 
yt yl |y= 
X X 
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the inhomogeneous spherical Bessel differential equation. As described, e.g., in [16, 
17], the inhomogeneous differential equation can be solved by the Lagrangian vari- 
ation of the parameters for equations of the type y” + py’ + qy = r, knowing its 
independent homogeneous solutions y; = A® (x) and y2 = ja (x). 

It uses a solution y = uy, + vyz with variable parameters u and v, which upon 
first and second-order differentiation becomes 


y=uyitvy,, ysuytwy +vy,+Vv'y2 
y" =uyl + 2u'y, + u'yi + vyj + 2v'y, + v"y2. 


Inserted into the equation y” + py’ + qy = r, this yields 


=0 =0 
—_—*—————_ a A 


u(y, + py, tay) H OS + pyh +qy2) +u" yı + 2u'y, + "2 + 2v'y5 
+p yi +v'y2) = 
roe / 


(u'yi +v yo) +u'y, +v y + p Uyi +v y) = uyi vy + (E+ DW’ +y) =r. 


Now two functions u and v are to be determined from only one equation, so we 
may pose an additional constraint. The above equation would simplify if the term 
(u'yı + v'y2) vanished. By this and the simplified equation, we get two conditions 


I? uyi +Vy2 =0 
Il: wy, HV y =r 


and obtain by elimination with either A = I y| — II yı or B = I y} — I y2 


A: v (yiy2 — Wy) = -F yı 
Vei 
-W 
B: u’ (yi ¥2 — iy) = -F y2. 
— 
-W 


So that the solution y = uy; + vyz uses u = f 5? dx and v = f 5 dx. In our case, 


we have y; = h® (x), y2 = ja (x), r = —x3"ô(r — ro), and the Wronskian W = 
(ix*)~! from Eq. (A.97), hence with integration constants enforcing the physical 
solutions: 


x lo) 
y= Pw | ix? jn(x) xp 8 — ro) dx — ints) f ix? hP (x) xp 26(r — ro) dx. 
0 Pa 


To convert 5(r — ro) into (x — xo) with x = kr, we use f 6(x) dx = f d(r) dr = 1 
with the integration constant replaced, ae =k, hence dx = kdr and obviously by 
f d(x) kdr = f 5(r) dr we find 5(r) = k 6(x), 
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£40) ee =3 , arene!) 5 
y = —hp (x) 1x Jn (x) k xo d(x — x9) dx — jn (x) ix“hy, (x) k xg d(x — xo) dx 
0 x 


2 3 
Lip fin’) h00), forx = xo, 
jna) AY (xo) for x < xo. 


The solution becomes after re-substituting x = kr and expanding Wm = Y Yam Over 
the spherical harmonics p = $?c o Xr n Wam Y” (0): 


[0,6] n 
h® (kr) jn(kro), forr > ro 
=—ik nm YO" ; ae A.117 
pama D A bee hOMkr) forr<ro. TP 


Green’s function. For the Green’s function at the direction ĝo, the angular panning 
function is expanded as Qnm = Y?” (80), and we get the formulation of the Green’s 
function in terms of spherical basis functions: 


oo n 2 r kr for = '0 
hí ) (k )ja( 0) zZ 
, 3 : A.118 
G 1 2 » " ( 0) í i Ne h® (kro) for 7 < ro. l l 


n=0 m=—n 


Plane waves/far field approximation. Equation (6.7) in Sect. 6.3.1 formulates plane 

waves p = ek oor as far-field limit p = 4r lim,.—+00 ay G= lim, E E G 
-i To 

Using Eq. (A.117), a distribution of plane waves driven by the gains gO) = 


E, dom YamY.” consequently yields with lim,,... A® (kro) = i” hP (kro), 


ro 00 AP (kro) 


[0,6] n 
p=4r Ý§ Y n&r) | lim eu Y” (O) Vam 


n=0 m=—n 
[0,6] n 

= 4r 5 > i” jn (kr) Y” (0) Vam. (A.119) 
n=0 m=—n 


or for a single plane-wave direction Ynm = Y/" (80) 


p=4r X. X i" jnlkr) Y” (0) Y” (00). (A.120) 


n=0 m=—n 


A.7 Sine and Tangent Law 


The sine and tangent law [18] observes the sound pressure of plane waves at to 
locations x = 0, y = +d at ear distance in order to simulate the ear signals. A plane 
wave from the left half of the room from the angle g > 0 first arrives at the left ear 
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Piet = ei*d sing and later on the right one Pright = e ikd sing The phase difference is 
®, = 2kd sing. 

A superimposed pair of plane waves from the directions +a arrives at the left ear 
aS Piet = 21 eikd sina + g2 e ikd sina right aS Pright = 81 e ikd sina + g2 eikd sing — 


: = s (gi—g2) sin(kd sing) 
Pig. The phase difference O44 = 27 pep = 2 arctan ETAT CECT Econ i na 


can be lin- 


earized for long wave lengths kd > Oto ®4, © 2 arctan (ka Se sin a) © wy 2 81-8 
81+82 Bitg2 


kd sina. 
Comparing the phase difference of the single plane wave with the one of the 


superimposed pair, 2 kd sing = 2 kd : ae sin œ, one arrives at the sine law 
; §1— 8&2 . 
sing = ——— sing. 
81 + 82 


If we claim our hearing to possess the ability to not only estimate the interaural 


phase difference ® but also its derivative with regard to head rotation 22 3 ae , we arrive at 


a value pair of binaural features (®,, oe) = = 2kd (sing, cos gy) than should match 
the one of the stereophonic plane-wave pair. For stereo, the phase difference derived 


ae +3)|5=04 Pe 
with regard to head rotation is 2kd 2,4, ~ 2kd = z sinata aa E 


2kd T cosa = 2 kd cosa, and i (Pta, oO se) = 2kd Gears ae sina, cos œ). 
In polar coordinates, the radius of both value pairs differs. While the plane wave 
yields a value pair at the radius 2 kd in the binaural feature space, the stereophonic 
waves is of the radius 2 kd only at +a, at which one of the two gains must vanish, 
and amplitude panning can be used to connect these two points 2kd (+ sina, cos a) 
by a straight line. The plane wave with the most similar feature pair must lie on the 


same polar angle. We may equate the tangents of both points ae = + and obtain 
“05” 5 


the tangent law: 


tang = ay tana. 


gi + 82 


If instead of the angle of a plane wave with the closest features to those of a given 
amplitude difference is searched, but the closest features of an amplitude difference 
matching those of a given plane wave, then the sine law is the best match, even in 
the two-dimensional feature space. 
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